Section 01
[Introduction] Evaluation of LLMs on Vietnamese Legal Texts: Key Findings and Challenges
This article conducts a comprehensive evaluation of four large language models—GPT-4o, Claude 3 Opus, Gemini 1.5 Pro, and Grok-1—on the task of simplifying Vietnamese legal texts. Using a dual evaluation framework (quantitative performance benchmarking + qualitative error analysis), it reveals the trade-off between accuracy, readability, and consistency among the models, identifies the core challenge of current LLMs as insufficient legal reasoning ability, and proposes methodological contributions and practical implications.