# Can Large Models Serve as Parliamentary Advisors? A Deep Evaluation of Romanian Legislative Cases

> This article evaluates the reliability of large models as political advisors by comparing six commercial LLMs against the official legislative justification documents of the Romanian Senate. The study finds that cutting-edge models perform excellently, but all models have task-dependent hallucination issues—they perform well on standardized template tasks but produce plausible yet unsubstantiated reasoning on politically specific proposals.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-31T17:27:12.000Z
- 最近活动: 2026-04-01T02:20:07.307Z
- 热度: 138.1
- 关键词: AI政治应用, 立法评估, 大模型可靠性, 委托代理理论, 有限理性, 事实核查
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2603-30028v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2603-30028v1
- Markdown 来源: floors_fallback

---

## [Introduction] Can Large Models Serve as Parliamentary Advisors? Core Evaluation of Romanian Legislative Cases

This article evaluates the reliability of large models as political advisors by comparing six commercial LLMs against the official legislative justification documents of the Romanian Senate. Key findings: Cutting-edge models perform excellently, but all models have task-dependent hallucination issues—they perform well on standardized template tasks but produce plausible yet unsubstantiated reasoning on politically specific proposals. The study points out that the real risk of AI-assisted political decision-making is contextual ignorance rather than ideological bias, and we need to be alert to "confident errors" in edge cases.

## Research Background: Potential and Risks of AI Entering the Field of Political Decision-Making

As the capabilities of large language models improve, their application potential in text processing tasks such as policy analysis and legislative drafting has become evident. However, political decision-making is high-risk: incorrect legal interpretations can have far-reaching social impacts, and hallucinated policy bases can damage democratic credibility. Therefore, strict evaluation of LLM reliability is necessary before their introduction.

## Research Design: Romanian Legislative Cases and Evaluation Methods

**Case Selection**: 15 legal proposals from the Romanian Senate and official "justification documents" (gold standard)
**Tested Models**: OpenAI (GPT-5-mini, GPT-5-chat), Anthropic (Claude Haiku4.5), Meta (Llama4 Maverick, Llama3.3 70B, Llama3.1 8B)
**Evaluation Framework**: Double verification—LLM-as-Judge semantic similarity scoring (1-5 points) + programmatic text matching algorithm

## Key Findings: Model Performance Stratification and Task-Dependent Hallucination

**Model Stratification**: 
- Tier 1 (Cutting-edge commercial models): Claude Haiku4.5, GPT-5-chat, GPT-5-mini, with semantic similarity >4.6 points
- Tier 2 (Open-source models): Llama series scored significantly lower, effect size >1.4
**Hallucination Issues**: All models have task-dependent hallucinations—they perform well on standardized legal framework tasks (due to abundant training data and standardized language); on politically specific proposals (local issues, innovative policies), they generate unsubstantiated reasoning (false data, fabricated precedents, etc.)

## Theoretical Framework: Principal-Agent and Cascading Bounded Rationality

**Principal-Agent Theory**: Politicians (principals) entrust AI (bounded rationality agents) with policy tasks, leading to structural information asymmetry
**Cascading Bounded Rationality**: Bounded rationality politicians → AI agents → evaluators, where errors propagate and amplify across levels

## Key Risks and Policy Implications

**Key Risks**: The real risk is contextual ignorance (insufficient coverage of specific political contexts in training data), making errors difficult to predict/detect
**Policy Recommendations**: 
1. Tiered usage: Human review of draft outputs
2. Context awareness: Reduce AI reliance on sensitive/innovative issues
3. Verification mechanisms: Fact-checking + logical inspection
4. Transparency: Label the scope of AI involvement
5. Continuous monitoring: Regular evaluation of actual effects

## Research Limitations and Future Directions

**Limitations**: Small sample size (15 cases), geographical limitation (Romania), potential bias in LLM-as-Judge
**Future Directions**: Expand to legal systems of more countries, develop hallucination detection tools for political domains, explore best practices for human-AI collaboration