Zing Forum

Reading

Northern Thai LLM: Evaluation Framework for Dialect Understanding Capabilities of Large Language Models

For the translation task between Northern Thai dialect (Lanna language) and Standard Thai, this project constructs a complete evaluation framework for large language models, and significantly improves the models' performance on minority languages through LoRA fine-tuning.

大语言模型低资源语言泰语兰纳语LoRA微调机器翻译方言理解AI公平性
Published 2026-05-13 02:56Recent activity 2026-05-13 03:03Estimated read 5 min
Northern Thai LLM: Evaluation Framework for Dialect Understanding Capabilities of Large Language Models
1

Section 01

Introduction / Main Floor: Northern Thai LLM: Evaluation Framework for Dialect Understanding Capabilities of Large Language Models

For the translation task between Northern Thai dialect (Lanna language) and Standard Thai, this project constructs a complete evaluation framework for large language models, and significantly improves the models' performance on minority languages through LoRA fine-tuning.

2

Section 02

Project Background: Linguistic Diversity and AI Fairness

Lanna language (ISO code: nod/nort2740) is a dialect used by millions of people in Northern Thailand, with significant differences from Standard Thai (tha/thai1261). Although it has a writing system (Lanna script), it is severely lacking in digital resources and internet content. This data scarcity makes Lanna a typical low-resource language scenario, which is ideal for testing the capability boundaries of large language models in handling non-mainstream languages.

3

Section 03

Three-Layer Architecture Design

The project adopts a clear three-layer architecture, with each layer named after a Lanna cultural item:

4

Section 04

Layer 1: lanna_khuang (Data Layer)

"Khuang" means container in Lanna culture; this layer is responsible for containerized data management:

  • Convert raw corpus in Excel format to JSONL
  • Perform stratified division of training/development/test sets
  • Manage the alt-translation flow
  • Support bidirectional translation: Lanna → Standard Thai, Standard Thai → Lanna
5

Section 05

Layer 2: lanna_kuafai (Adaptation Layer)

"Kuafai" means bamboo tray, symbolizing bearing and transmission. This layer is responsible for the actual operation of the model:

  • Support cutting-edge API calls (GPT-4o, Claude, Gemini, DeepSeek-V3)
  • Inference for open-source weight models (Typhoon2, SeaLLM, Qwen2.5, LLaMA-3.1-8B)
  • LoRA fine-tuning (PEFT r=8)
  • Provide the lanna-kuafai command-line tool
6

Section 06

Layer 3: lanna_jorfa (Diagnostic Layer)

"Jorfa" means offering, representing the examination and inspection of the model. This layer focuses on evaluation and analysis:

  • Triple-ChrF scoring (supports variable N-grams 1-4)
  • G-statistic calculation
  • Multi-dimensional facet slicing
  • Error typology analysis
  • Manual scoring form (BaiLan)
  • Krippendorff's α consistency test (HomPoi)
7

Section 07

Triple-ChrF Scoring Mechanism

The project adopts an improved ChrF (character-level F-score) evaluation method, calculating scores in three dimensions simultaneously:

  1. ChrF_avg: Average F-score
  2. ChrF_max: Best performance
  3. ChrF_diff: Score difference (reflects the instability of model output)

This triple evaluation mechanism can capture the overall level and fluctuation degree of model performance.

8

Section 08

Error Typology Analysis

The project establishes a five-category error classification system to help deeply understand model failure patterns:

  • Lexical-level errors
  • Syntactic-level errors
  • Semantic-level errors
  • Cultural-specific item errors
  • Transcription errors