# Northern Thai LLM: Evaluation Framework for Dialect Understanding Capabilities of Large Language Models

> For the translation task between Northern Thai dialect (Lanna language) and Standard Thai, this project constructs a complete evaluation framework for large language models, and significantly improves the models' performance on minority languages through LoRA fine-tuning.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-12T18:56:07.000Z
- 最近活动: 2026-05-12T19:03:19.414Z
- 热度: 159.9
- 关键词: 大语言模型, 低资源语言, 泰语, 兰纳语, LoRA微调, 机器翻译, 方言理解, AI公平性
- 页面链接: https://www.zingnex.cn/en/forum/thread/northern-thai-llm
- Canonical: https://www.zingnex.cn/forum/thread/northern-thai-llm
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: Northern Thai LLM: Evaluation Framework for Dialect Understanding Capabilities of Large Language Models

For the translation task between Northern Thai dialect (Lanna language) and Standard Thai, this project constructs a complete evaluation framework for large language models, and significantly improves the models' performance on minority languages through LoRA fine-tuning.

## Project Background: Linguistic Diversity and AI Fairness

Lanna language (ISO code: nod/nort2740) is a dialect used by millions of people in Northern Thailand, with significant differences from Standard Thai (tha/thai1261). Although it has a writing system (Lanna script), it is severely lacking in digital resources and internet content. This data scarcity makes Lanna a typical low-resource language scenario, which is ideal for testing the capability boundaries of large language models in handling non-mainstream languages.

## Three-Layer Architecture Design

The project adopts a clear three-layer architecture, with each layer named after a Lanna cultural item:

## Layer 1: lanna_khuang (Data Layer)

"Khuang" means container in Lanna culture; this layer is responsible for containerized data management:

- Convert raw corpus in Excel format to JSONL
- Perform stratified division of training/development/test sets
- Manage the alt-translation flow
- Support bidirectional translation: Lanna → Standard Thai, Standard Thai → Lanna

## Layer 2: lanna_kuafai (Adaptation Layer)

"Kuafai" means bamboo tray, symbolizing bearing and transmission. This layer is responsible for the actual operation of the model:

- Support cutting-edge API calls (GPT-4o, Claude, Gemini, DeepSeek-V3)
- Inference for open-source weight models (Typhoon2, SeaLLM, Qwen2.5, LLaMA-3.1-8B)
- LoRA fine-tuning (PEFT r=8)
- Provide the `lanna-kuafai` command-line tool

## Layer 3: lanna_jorfa (Diagnostic Layer)

"Jorfa" means offering, representing the examination and inspection of the model. This layer focuses on evaluation and analysis:

- Triple-ChrF scoring (supports variable N-grams 1-4)
- G-statistic calculation
- Multi-dimensional facet slicing
- Error typology analysis
- Manual scoring form (BaiLan)
- Krippendorff's α consistency test (HomPoi)

## Triple-ChrF Scoring Mechanism

The project adopts an improved ChrF (character-level F-score) evaluation method, calculating scores in three dimensions simultaneously:

1. **ChrF_avg**: Average F-score
2. **ChrF_max**: Best performance
3. **ChrF_diff**: Score difference (reflects the instability of model output)

This triple evaluation mechanism can capture the overall level and fluctuation degree of model performance.

## Error Typology Analysis

The project establishes a five-category error classification system to help deeply understand model failure patterns:

- Lexical-level errors
- Syntactic-level errors
- Semantic-level errors
- Cultural-specific item errors
- Transcription errors
