# Aegis: An LLM Intelligent Routing and Hallucination Detection Gateway Based on Causal Inference

> Aegis is a production-grade LLM gateway that automatically routes prompts to the most cost-effective model via a complexity classifier and uses causal inference technology to detect hallucinations without requiring ground truth labels. The system integrates semantic caching, multi-level risk detection, and real-time cost monitoring, providing a safe and cost-effective LLM invocation solution for high-stakes scenarios.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-09T20:39:23.000Z
- 最近活动: 2026-04-09T20:51:51.634Z
- 热度: 161.8
- 关键词: LLM网关, 因果推断, 幻觉检测, 智能路由, 成本优化, 语义缓存, 生产系统, DoWhy, 安全网关
- 页面链接: https://www.zingnex.cn/en/forum/thread/aegis-llm
- Canonical: https://www.zingnex.cn/forum/thread/aegis-llm
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: Aegis: An LLM Intelligent Routing and Hallucination Detection Gateway Based on Causal Inference

Aegis is a production-grade LLM gateway that automatically routes prompts to the most cost-effective model via a complexity classifier and uses causal inference technology to detect hallucinations without requiring ground truth labels. The system integrates semantic caching, multi-level risk detection, and real-time cost monitoring, providing a safe and cost-effective LLM invocation solution for high-stakes scenarios.

## Dual Challenges of LLM Applications in Production Environments

When enterprises use large language models in production environments, they face two intertwined challenges. First is cost waste: simple queries are often routed to high-end models like GPT-4o, while in reality, Llama 3.1 (free) or Gemini Flash ($0.075 per 1M tokens) can provide answers of the same quality. Second is silent hallucination: LLMs produce confident, fluent but incorrect answers, which can lead to serious consequences in high-risk scenarios such as healthcare, law, and finance.

The Aegis project is designed to address these two issues. It is an end-to-end production system that not only implements intelligent routing but also, more importantly, introduces causal inference technology to detect hallucinations—without requiring ground truth labels as a reference.

## Core Architecture: More Than Just Routing

Routing solutions on the market (such as OpenRouter, LiteLLM) are already quite mature, and by 2026 this will become a commoditized feature. Aegis's differentiating value lies in its causal hallucination detection mechanism.

Traditional fact-checking methods require knowing the "correct answer" to judge whether the model output is accurate, but in production environments, we often do not have such references. Aegis uses an ingenious causal question: if only the wording of the question is changed, will the factual statement change?

If the model gives different answers to different formulations of the same question, this is a causal signal—the statement is not based on knowledge but on the surface features of the prompt. This is called a do(X) intervention in causal inference. This method does not require labels, ground truth answers, or external knowledge bases.

## Five-Level Routing and Cost Optimization

Aegis implements a five-level model routing based on complexity scoring:

| Level | Model | Cost per 1M tokens | Applicable Scenarios |
|------|------|------------------|----------|
| Free | Llama 3.1 8B (local Ollama) | $0.00 | Simple fact queries, conversations |
| Economic | Gemini 1.5 Flash | $0.075 | Low-medium complexity |
| Standard | GPT-4o-mini | $0.150 | Medium complexity |
| Premium | Claude 3.5 Haiku | $0.250 | Medium-high complexity, requires detailed reasoning |
| High-end | GPT-4o | $2.500 | Complex reasoning, high-risk scenarios |

The complexity classifier uses a four-factor weighted score: semantic embedding norm (30%), text structure score (25%), question type score (25%), and domain keyword density (20%). The score ranges from 0.0 to 1.0, automatically routing to the most cost-effective capable model.

For the legal, medical, and financial domains, the system implements a hard gateway: regardless of the complexity score, GPT-4o is mandatory, and this rule cannot be overridden by the classifier.

## Level 1: Hedge Phrase Detection (Free, Full-Scale Operation)

The system scans responses for 25 confidence-weakening phrases, such as "I'm not sure", "I think", "maybe", "as far as I know", etc. Detecting 3 or more marks it as a potential hallucination (medium risk). This method is zero-cost and is executed for all requests from all providers.

## Level 3: Rewrite Variance Detection (Conditionally Triggered)

When the query belongs to the legal/medical/financial domain, or the complexity score exceeds 0.7, the system triggers deep detection:

1. Use GPT-4o-mini to generate two different formulations of the same question
2. Send three versions of the question (original + two rewrites) to the target model in parallel
3. Calculate the average cosine similarity of the embedding vectors of the three responses
4. Variance = 1 - average similarity

If the variance exceeds the threshold θ=0.35, it is marked as a high-risk hallucination. This threshold is not arbitrarily set; it is calibrated offline via the DoWhy library and confirmed for its causal rationality through placebo treatment refutation tests.

## Risk Level Merging

The final risk level takes the larger value between domain risk and detection risk: the legal/medical domain is naturally high risk, the financial domain is medium risk; rewrite variance detection triggers high risk, and hedge phrase detection triggers medium risk.

## Semantic Caching: Zero-Cost Hits

Aegis implements an in-memory cache based on sentence-transformers/all-MiniLM-L6-v2. The threshold is set to 0.85 (instead of 0.95, which has a hit rate below 1% in practice). When the cache is hit, the response time is about 5 milliseconds, with zero cost.

The embedding model instance is shared between the cache and the hallucination detector, avoiding repeated loading of approximately 90MB of model weights. The cache resets with server restart (a design choice for the demo environment).
