# Large Language Models Already Possess Self-Evaluation Capabilities: The SEE Method Can Unlock Latent Judgment Calibration Ability with Only 160 Samples

> Researchers have found that large language models (LLMs) can predict scores from external judges without specialized training. Using the proposed Self-Evaluation Elicitation (SEE) method, this latent ability can be effectively unlocked with only 160 samples, achieving 31 times higher data efficiency than traditional reinforcement learning methods.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-03T17:27:16.000Z
- 最近活动: 2026-06-04T05:51:09.313Z
- 热度: 147.6
- 关键词: 大语言模型, 自我评估, 模型校准, 强化学习, 数据效率, 模型评判, 机器学习, 自然语言处理
- 页面链接: https://www.zingnex.cn/en/forum/thread/see160
- Canonical: https://www.zingnex.cn/forum/thread/see160
- Markdown 来源: floors_fallback

---

## [Introduction] Latent Self-Evaluation Capabilities of Large Language Models Can Be Efficiently Unlocked via the SEE Method

Key Points: The study found that basic large language models already have latent self-evaluation capabilities to predict scores from external judges without specialized training. The proposed Self-Evaluation Elicitation (SEE) method can unlock this ability with only 160 samples, which is 31 times more data-efficient than traditional reinforcement learning methods. This capability is transferable and maintains answer quality, making it of great significance for model optimization and deployment.

## Research Background and Core Questions

As the capabilities of large language models (LLMs) improve, evaluating output quality has become a key challenge. The current common approach is 'model judging model', but the core question is: Can a model predict the score a judge would give to its own output? The study found that this self-evaluation ability already exists in basic models; it just needs the right method to unlock it, and few-shot prompts can make the model's prediction accuracy of external judges' scores significantly higher than random levels.

## SEE Method: A Two-Stage Unlocking Framework

The SEE method is a two-stage training framework:
### Stage 1: Calibration-Coupled Reinforcement Learning
Optimize two objectives simultaneously—improve answer quality and train the model to predict judges' scores. Through 'calibration coupling', the model generates good answers while accurately predicting scores.
### Stage 2: Masked Distillation
While keeping the answer generation part unchanged, specifically optimize the score prediction part to ensure that answer quality does not degrade while improving self-evaluation capabilities.

## Stunning Data Efficiency: Efficient Unlocking with 160 Samples

The SEE method has extremely high data efficiency: only 160 unique samples are needed to achieve significant calibration improvements across three benchmark tests; in contrast, traditional reinforcement learning baseline methods require about 5000 samples to achieve similar results, representing a 31-fold increase in data efficiency. This means teams with limited resources can also train models with good self-evaluation capabilities, reducing data annotation costs.

## Key Findings: Transferable Quality Perception Characteristics

The study reveals three important findings:
1. **Localization Characteristic**: Self-evaluation ability is highly localized in the model's own token distribution, evaluating based on intrinsic features of generated text without relying on external rules;
2. **Cross-Judge Stability**: Remains stable even with judges it hasn't been trained on—what it learns is universal 'quality perception' rather than specific judge preferences;
3. **Answer Quality Preservation**: The quality of answer generation does not decline during training, solving the dilemma between improving evaluation capabilities and decreasing generation quality.

## Research Significance and Practical Implications

### Theoretical Level
Redefines the essence of the model self-evaluation problem: from 'acquiring' to 'unlocking', suggesting that LLMs may hide more latent capabilities waiting to be unlocked.
### Practical Level
1. Reduce deployment costs: Used for online quality monitoring, reducing reliance on expensive external judge APIs;
2. Improve reasoning efficiency: Models self-filter low-quality content during generation;
3. Enhance interpretability: Self-evaluation scores provide an intrinsic quality indicator;
4. Promote model iteration: Automatically screen high-quality training data, forming a virtuous cycle.

## Limitations and Future Research Directions

Current research limitations: Experiments are mainly based on specific open-ended question-and-answer tasks; effects in fields like code generation and mathematical reasoning need to be verified. Future directions: Further improve the absolute accuracy of self-evaluation and expand to multimodal scenarios.

## Research Summary

This study reveals that large language models already have latent self-evaluation capabilities, and the SEE method—with its concise two-stage design and extremely high data efficiency (160 samples)—successfully unlocks this ability. This intrinsic quality perception capability will play an increasingly important role in model optimization, deployment monitoring, and automatic iteration.