Zing Forum

Reading

Evaluation of Contextual Translation Capabilities of Large Language Models: Key Bottlenecks Revealed by Synchronous Context-Free Grammar Transduction Experiments

Researchers systematically evaluated the performance of large language models in contextual translation tasks by constructing synchronous context-free grammars, and found that model performance decreases significantly with the scale of the grammar and the length of sentences, and performs worse on language pairs with large morphological differences.

大语言模型机器翻译低资源语言上下文学习形式文法同步上下文无关文法语言理解人工智能评测
Published 2026-04-09 01:35Recent activity 2026-04-09 12:14Estimated read 7 min
Evaluation of Contextual Translation Capabilities of Large Language Models: Key Bottlenecks Revealed by Synchronous Context-Free Grammar Transduction Experiments
1

Section 01

[Introduction] Core Findings of the Evaluation of Contextual Translation Capabilities of Large Language Models

This study systematically evaluated the contextual translation capabilities of large language models by constructing synchronous context-free grammars (SCFG). It found that model performance decreases significantly with the scale of the grammar and the length of sentences, and performs worse on language pairs with large differences in morphology and writing systems. Additionally, it identified typical error patterns such as lexical recall errors, hallucination generation, and untranslated residues, providing key references for low-resource language translation and model improvement.

2

Section 02

Research Background and Motivation

Machine translation for low-resource languages is a major challenge in the field of artificial intelligence. Traditional large language models (LLMs) require massive training data, but minority languages often lack such resources. Contextual learning (allowing models to 'learn' new languages during inference by providing grammar textbooks, dictionaries, etc.) is a potential solution, but its effectiveness depends on the model's understanding and application of grammatical descriptions. To accurately measure this capability, the study designed a string transduction evaluation framework based on synchronous context-free grammars (SCFG).

3

Section 03

Experimental Design and Methods

Construction of Synchronous Context-Free Grammars

The research team constructed a series of SCFGs, each defining a pair of formal languages that simulate grammatical features, morphological changes, and writing systems of natural languages, enabling translation capability testing in a controlled environment.

Evaluation Dimensions

The experiment manipulated key variables:

  • Grammar scale: From small to large complex grammars, testing the model's ability to handle rules of different complexities
  • Sentence length: Comparing translation accuracy between short and long sentences
  • Differences in language features: Covering syntactic structure, complexity of morphological changes, and differences in writing systems
  • Language pair combinations: Including multiple combinations with different linguistic features
4

Section 04

Core Research Findings

Finding 1: Scale Sensitivity

The model's translation accuracy decreases significantly with the increase in grammar scale and sentence length, and its performance deteriorates when handling complex rules or long sentences.

Finding 2: Impact of Morphological and Writing System Differences

Differences between source and target languages in morphology and writing representation severely weaken performance; for example, language pairs with rich word forms vs. simple morphology, or different writing systems, have higher translation difficulty.

Finding 3: Error Pattern Analysis

Three main types of errors were identified:

  1. Lexical recall error: Recalling incorrect target language vocabulary
  2. Hallucination generation: Creating non-existent new words in the target language
  3. Untranslated residue: Directly retaining source language vocabulary in the output
5

Section 05

Research Significance and Implications

Implications for Low-Resource Language Translation

Contextual learning is theoretically feasible, but current models still face challenges in using grammatical descriptions for translation. It is necessary to carefully design prompt strategies and consider the boundaries of model capabilities.

Contribution to Model Evaluation

The introduction of formal grammar transduction tasks provides an accurate and repeatable testing platform, which can isolate and measure specific language capabilities.

Future Research Directions

It is necessary to explore methods to improve the model's ability to understand complex grammars, reduce cross-language difference losses, and enhance the reliability of formal language tasks.

6

Section 06

Research Conclusion

Through rigorous experimental design, this study systematically evaluated the contextual translation capabilities of large language models, revealed key bottlenecks in their handling of complex grammatical rules and cross-language differences, and provided important references for model improvement and the application of low-resource language translation.