Reading

SAS: An Open-Source Framework for Generative AI Hallucination Detection Based on Topological Data Analysis

SAS (Symbiotic Autoprotection System) is an open-source API framework specifically designed to detect structural hallucinations in generative AI outputs. Developed by Gonzalo Emir Durante, this project combines Topological Data Analysis (TDA), numerical invariance checks, and modular detection probes, achieving 98.8% accuracy and 100% precision on 2000 test sample pairs.

生成式AI幻觉检测拓扑数据分析Durante常数AI安全开源框架FastAPI机器学习

Published 2026-04-30 07:07Recent activity 2026-04-30 09:59Estimated read 7 min

Section 01

Introduction / Main Floor: SAS: An Open-Source Framework for Generative AI Hallucination Detection Based on Topological Data Analysis

Section 02

Background: The Hallucination Dilemma of Generative AI

Generative AI systems (such as large language models) have made remarkable progress in recent years, capable of generating fluent and coherent text. However, these systems have a fundamental problem: they may produce "structural hallucinations"—outputs that are superficially fluent but have deep logical inconsistencies, numerical errors, or semantic deviations from the input content.

Traditional similarity metrics (like cosine similarity, BLEU score) often fail to detect such issues, as hallucinatory content may maintain surface fluency while breaking deep semantic consistency. For example, a model might confidently claim "The Eiffel Tower is located in Berlin, Germany", which is grammatically correct and uses appropriate words but is completely wrong at the factual level.

Section 03

Overview of the SAS Framework

SAS (Symbiotic Autoprotection System) is an open-source API framework developed by Gonzalo Emir Durante to address this challenge. Released in April 2026, the project uses the GPL-3.0 + Durante Invariance License and is registered on Zenodo (DOI: 10.5281/zenodo.19689077).

The core idea of SAS is to treat hallucination detection as a "structural consistency audit" problem rather than a simple similarity calculation. The framework evaluates whether the generated response maintains consistency with the source text or prompt in the following dimensions:

Semantic structure integrity
Logical consistency
Numerical accuracy
Factual anchoring signals

Section 04

Durante Constant (κD = 0.56)

SAS introduces a key threshold parameter: κD (Durante Constant), with a value of 0.56. This constant serves as the key consistency threshold in the framework, representing the critical point where semantic noise drops below structural consistency and meaning is sufficiently stable.

Operational explanation:

When the Invariant Similarity Index (ISI) ≥ κD, it is judged as "structurally consistent"
When ISI < κD, it is judged as "possible manifold rupture/hallucination signal"

Section 05

Topological Data Analysis (TDA)

SAS uses Topological Data Analysis to compare semantic structures. TDA can capture high-dimensional shape features of data and identify rupture points of semantic manifolds—areas where the model output deviates from the input semantic structure. This method goes beyond traditional bag-of-words models or vector similarity, enabling detection of deeper semantic fractures.

Section 06

Invariant Similarity Index (ISI)

ISI is the core scoring metric of SAS, combining TDA results and numerical invariance checks to quantify the structural similarity between the source text and generated response. Unlike soft similarity, ISI is designed to be sensitive to structural fractures and robust to surface changes.

Section 07

Numerical Invariance Guard (NIG)

It specifically detects numerical consistency. When the model output involves numbers, dates, or statistical data, the NIG module verifies whether these values are consistent with the source information, capturing common "numerical hallucination" issues.

Section 08

Modular Detection Probes (E9-E12)

SAS includes four independently enableable experimental detection modules:

E9 - Logical Contradiction Detection: Identifies internally reversed logic or self-contradictory statements. For example, claiming both "All swans are white" and "There exist black swans" will be marked as a logical contradiction.

E10 - Factual Anchoring Check: Detects unsupported claims when local knowledge sources are available. This module evaluates whether the model output is based on verifiable facts or is "narrative fiction".

E11 - Temporal Inconsistency Detection: Identifies incompatible time sequences. For example, claiming an event occurred before its prerequisites will be marked.

E12 - Topic Drift Detection: Detects sudden topic changes without transition signals. Triggered when the model suddenly shifts to an unrelated topic in its response.

These modules operate as independent penalty factors, complementing rather than replacing the core ISI/TDA calculation.

SAS: An Open-Source Framework for Generative AI Hallucination Detection Based on Topological Data Analysis

Introduction / Main Floor: SAS: An Open-Source Framework for Generative AI Hallucination Detection Based on Topological Data Analysis

Background: The Hallucination Dilemma of Generative AI

Overview of the SAS Framework

Durante Constant (κD = 0.56)

Topological Data Analysis (TDA)

Invariant Similarity Index (ISI)

Numerical Invariance Guard (NIG)

Modular Detection Probes (E9-E12)

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization