Reading

Noesis Tension: Decoding Prompt-Induced Representation Tension in Large Models Using Telemetry Technology

Explore how the Noesis Tension project constructs a classification system for prompt-induced tension in large language models through KV cache telemetry, cognitive state inference, and MoE routing tracking, providing a new perspective for AI safety and interpretability research.

大语言模型AI安全可解释AIKV缓存遥测技术模型监控幻觉检测MoE架构

Published 2026-05-11 21:49Recent activity 2026-05-11 22:00Estimated read 5 min

Section 01

[Introduction] Noesis Tension: Decoding Prompt-Induced Representation Tension in Large Models Using Telemetry Technology

The Noesis Tension project proposes an innovative telemetry-driven approach. By monitoring KV cache dynamics, attention mechanisms, and MoE routing patterns, it constructs a classification system for prompt-induced representation tension, offering a new perspective for AI safety and interpretability research and helping to early warn of potential risky behaviors of models.

Section 02

Research Background: Why Do We Need Large Model 'Tension' Monitoring?

Traditional large model safety research only focuses on input-output inspection and cannot predict internal state changes. The core idea of Noesis Tension is that prompts trigger measurable 'representation tension' inside the model, which can early warn of hallucinations, repetitive loops, or attempts to test safety boundaries. Analogous to medical vital sign monitoring, metrics like KV cache can reveal cognitive state transitions.

Section 03

Core Technology: Analysis of the Three-Layer Telemetry System

Layer 1: KV Cache Telemetry

Track norm drift history, rolling coherence history, mean norm history, and drift summary statistics to quantify the model's cognitive state.

Layer 2: Cognitive State Inference Engine

Automatically identify four states: safe procedural state, symbolic repetition drift, lightweight version of confident hallucination, and critical drift.

Layer 3: MoE Routing Tracking

Record the distribution of activated experts during the generation steps of MoE architecture models, revealing resource calling patterns under different cognitive states.

Section 04

Technical Implementation and Experimental Findings

Adopt a pure telemetry classification strategy, which is not affected by prompt encoding and can detect jailbreak attempts. Experimental findings: Llama-3.1-8B has higher tension values on safe prompts than Mistral-7B; creative tasks are easily misjudged as repetitive drift; a conservative marking strategy (tension ≥0.67 and significant peak triggers HIGH_TENSION) balances false positives and false negatives.

Section 05

Application Scenarios: Practical Value Across Multiple Domains

AI Safety Research: Provide quantitative tools for red team testing to identify subtle jailbreak patterns;
Model Interpretability: Observe internal state differences across different models/training stages;
Production Monitoring: Lightweight runtime monitoring to trigger manual review or automatic retries;
Model Comparison and Evaluation: Complement traditional benchmark tests to evaluate safety and stability.

Section 06

Limitations and Future: Discussion on Improvement Directions

Current limitations: Precision of creative content classification needs improvement, calibration issues for inter-model differences, and limitations of single-turn dialogue analysis. Future directions: Introduce context-aware features to distinguish between intentional repetition and out-of-control loops, explore model-agnostic normalization methods, and study cross-turn tension accumulation in multi-turn dialogues.

Section 07

Conclusion: An Important Step Toward Interpretable AI

Noesis Tension represents a shift in safety research toward internal state monitoring, providing earlier risk warnings and opening a window to understand the model black box. The project code has been open-sourced on GitHub (v3.0-stable version), serving as a practical tool for AI safety and interpretability researchers.

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54