Zing Forum

Reading

Can Large Models Serve as Parliamentary Advisors? A Deep Evaluation of Romanian Legislative Cases

This article evaluates the reliability of large models as political advisors by comparing six commercial LLMs against the official legislative justification documents of the Romanian Senate. The study finds that cutting-edge models perform excellently, but all models have task-dependent hallucination issues—they perform well on standardized template tasks but produce plausible yet unsubstantiated reasoning on politically specific proposals.

AI政治应用立法评估大模型可靠性委托代理理论有限理性事实核查
Published 2026-04-01 01:27Recent activity 2026-04-01 10:20Estimated read 6 min
Can Large Models Serve as Parliamentary Advisors? A Deep Evaluation of Romanian Legislative Cases
1

Section 01

[Introduction] Can Large Models Serve as Parliamentary Advisors? Core Evaluation of Romanian Legislative Cases

This article evaluates the reliability of large models as political advisors by comparing six commercial LLMs against the official legislative justification documents of the Romanian Senate. Key findings: Cutting-edge models perform excellently, but all models have task-dependent hallucination issues—they perform well on standardized template tasks but produce plausible yet unsubstantiated reasoning on politically specific proposals. The study points out that the real risk of AI-assisted political decision-making is contextual ignorance rather than ideological bias, and we need to be alert to "confident errors" in edge cases.

2

Section 02

Research Background: Potential and Risks of AI Entering the Field of Political Decision-Making

As the capabilities of large language models improve, their application potential in text processing tasks such as policy analysis and legislative drafting has become evident. However, political decision-making is high-risk: incorrect legal interpretations can have far-reaching social impacts, and hallucinated policy bases can damage democratic credibility. Therefore, strict evaluation of LLM reliability is necessary before their introduction.

3

Section 03

Research Design: Romanian Legislative Cases and Evaluation Methods

Case Selection: 15 legal proposals from the Romanian Senate and official "justification documents" (gold standard) Tested Models: OpenAI (GPT-5-mini, GPT-5-chat), Anthropic (Claude Haiku4.5), Meta (Llama4 Maverick, Llama3.3 70B, Llama3.1 8B) Evaluation Framework: Double verification—LLM-as-Judge semantic similarity scoring (1-5 points) + programmatic text matching algorithm

4

Section 04

Key Findings: Model Performance Stratification and Task-Dependent Hallucination

Model Stratification:

  • Tier 1 (Cutting-edge commercial models): Claude Haiku4.5, GPT-5-chat, GPT-5-mini, with semantic similarity >4.6 points
  • Tier 2 (Open-source models): Llama series scored significantly lower, effect size >1.4 Hallucination Issues: All models have task-dependent hallucinations—they perform well on standardized legal framework tasks (due to abundant training data and standardized language); on politically specific proposals (local issues, innovative policies), they generate unsubstantiated reasoning (false data, fabricated precedents, etc.)
5

Section 05

Theoretical Framework: Principal-Agent and Cascading Bounded Rationality

Principal-Agent Theory: Politicians (principals) entrust AI (bounded rationality agents) with policy tasks, leading to structural information asymmetry Cascading Bounded Rationality: Bounded rationality politicians → AI agents → evaluators, where errors propagate and amplify across levels

6

Section 06

Key Risks and Policy Implications

Key Risks: The real risk is contextual ignorance (insufficient coverage of specific political contexts in training data), making errors difficult to predict/detect Policy Recommendations:

  1. Tiered usage: Human review of draft outputs
  2. Context awareness: Reduce AI reliance on sensitive/innovative issues
  3. Verification mechanisms: Fact-checking + logical inspection
  4. Transparency: Label the scope of AI involvement
  5. Continuous monitoring: Regular evaluation of actual effects
7

Section 07

Research Limitations and Future Directions

Limitations: Small sample size (15 cases), geographical limitation (Romania), potential bias in LLM-as-Judge Future Directions: Expand to legal systems of more countries, develop hallucination detection tools for political domains, explore best practices for human-AI collaboration