# Claim-Level Evidence Admissibility: A New Framework to Enhance the Reliability of Structured Outputs from Large Language Models

> This study proposes a new framework called EGCR (Claim-Level Evidence Admissibility) to improve the reliability of structured outputs generated by large language models (LLMs). By conducting evidence admissibility assessment at the claim level, this method can effectively identify and filter unreliable model outputs, making it particularly suitable for high-risk scenarios such as cybersecurity risk assessment and AI deployment decision-making.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-16T21:43:10.000Z
- 最近活动: 2026-06-16T21:51:09.635Z
- 热度: 148.9
- 关键词: 大语言模型, 结构化输出, 证据可接受性, 输出可靠性, 网络安全, AI风险评估, 声明级别评估
- 页面链接: https://www.zingnex.cn/en/forum/thread/claim-level-evidence-admissibility
- Canonical: https://www.zingnex.cn/forum/thread/claim-level-evidence-admissibility
- Markdown 来源: floors_fallback

---

## [Introduction] EGCR Framework: A New Solution to Enhance the Reliability of Structured Outputs from Large Language Models

This study proposes the EGCR (Claim-Level Evidence Admissibility) framework, which filters unreliable outputs from large language models (LLMs) through claim-level evidence admissibility assessment. It is suitable for high-risk scenarios such as cybersecurity risk assessment and AI deployment decision-making. The original authors are the research team from Nanchang University, and it was released on GitHub in 2024.

## Research Background and Challenges

LLMs have great potential in structured outputs (data extraction, risk assessment, etc.), but they tend to generate incorrect or unreliable content, which can have serious consequences in high-risk scenarios. Existing overall output verification methods have flaws: either they discard a large amount of correct information due to a few errors, or they miss detecting errors. A fine-grained claim-level assessment mechanism is needed.

## Core Concepts of the EGCR Framework

1. Fine-grained claim-level assessment: Break down outputs into independent claim units and accurately judge the evidence support for each assertion; 2. Evidence admissibility criteria: Evidence existence, relevance, sufficiency, and consistency; 3. Selective registration mechanism: Only claims that pass the assessment are included in the final output.

## Experimental Design and Datasets

Two controlled datasets were constructed: 1. Cybersecurity risk suite: Covers vulnerability cases such as SQL injection and XSS; 2. AI deployment risk suite: Focuses on compliance and ethical risks (data privacy, algorithm fairness, etc.).

## Local Model Experiments and Evaluation

Open-source models tested: Granite4.1(8B), Llama3.1(8B), Qwen2.5(7B), Qwen3(30B), Gemma4(31B). The results show that the EGCR framework can effectively improve the output quality of various models, and the improvement effect is consistent.

## Experimental Results and Findings

1. Strategy trade-off: Adjusting the evidence threshold can balance reliability and information integrity; 2. Multi-dimensional assessment: Covers dimensions such as consistency, selectivity, and contradiction detection; 3. Baseline comparison: EGCR outperforms traditional methods like overall verification and confidence filtering in balancing precision and recall.

## Research Significance and Application Prospects

Application scenarios: Cybersecurity (reducing false positives and false negatives in vulnerability assessment), AI governance (compliance risk assessment); Can be extended to fields such as healthcare, law, and finance; The open-source implementation provides a foundation for subsequent research and promotes the development of LLM reliability assessment.
