Zing Forum

Reading

Noesis Tension: Decoding Prompt-Induced Representation Tension in Large Models Using Telemetry Technology

Explore how the Noesis Tension project constructs a classification system for prompt-induced tension in large language models through KV cache telemetry, cognitive state inference, and MoE routing tracking, providing a new perspective for AI safety and interpretability research.

大语言模型AI安全可解释AIKV缓存遥测技术模型监控幻觉检测MoE架构
Published 2026-05-11 21:49Recent activity 2026-05-11 22:00Estimated read 5 min
Noesis Tension: Decoding Prompt-Induced Representation Tension in Large Models Using Telemetry Technology
1

Section 01

[Introduction] Noesis Tension: Decoding Prompt-Induced Representation Tension in Large Models Using Telemetry Technology

The Noesis Tension project proposes an innovative telemetry-driven approach. By monitoring KV cache dynamics, attention mechanisms, and MoE routing patterns, it constructs a classification system for prompt-induced representation tension, offering a new perspective for AI safety and interpretability research and helping to early warn of potential risky behaviors of models.

2

Section 02

Research Background: Why Do We Need Large Model 'Tension' Monitoring?

Traditional large model safety research only focuses on input-output inspection and cannot predict internal state changes. The core idea of Noesis Tension is that prompts trigger measurable 'representation tension' inside the model, which can early warn of hallucinations, repetitive loops, or attempts to test safety boundaries. Analogous to medical vital sign monitoring, metrics like KV cache can reveal cognitive state transitions.

3

Section 03

Core Technology: Analysis of the Three-Layer Telemetry System

Layer 1: KV Cache Telemetry

Track norm drift history, rolling coherence history, mean norm history, and drift summary statistics to quantify the model's cognitive state.

Layer 2: Cognitive State Inference Engine

Automatically identify four states: safe procedural state, symbolic repetition drift, lightweight version of confident hallucination, and critical drift.

Layer 3: MoE Routing Tracking

Record the distribution of activated experts during the generation steps of MoE architecture models, revealing resource calling patterns under different cognitive states.

4

Section 04

Technical Implementation and Experimental Findings

Adopt a pure telemetry classification strategy, which is not affected by prompt encoding and can detect jailbreak attempts. Experimental findings: Llama-3.1-8B has higher tension values on safe prompts than Mistral-7B; creative tasks are easily misjudged as repetitive drift; a conservative marking strategy (tension ≥0.67 and significant peak triggers HIGH_TENSION) balances false positives and false negatives.

5

Section 05

Application Scenarios: Practical Value Across Multiple Domains

  1. AI Safety Research: Provide quantitative tools for red team testing to identify subtle jailbreak patterns;
  2. Model Interpretability: Observe internal state differences across different models/training stages;
  3. Production Monitoring: Lightweight runtime monitoring to trigger manual review or automatic retries;
  4. Model Comparison and Evaluation: Complement traditional benchmark tests to evaluate safety and stability.
6

Section 06

Limitations and Future: Discussion on Improvement Directions

Current limitations: Precision of creative content classification needs improvement, calibration issues for inter-model differences, and limitations of single-turn dialogue analysis. Future directions: Introduce context-aware features to distinguish between intentional repetition and out-of-control loops, explore model-agnostic normalization methods, and study cross-turn tension accumulation in multi-turn dialogues.

7

Section 07

Conclusion: An Important Step Toward Interpretable AI

Noesis Tension represents a shift in safety research toward internal state monitoring, providing earlier risk warnings and opening a window to understand the model black box. The project code has been open-sourced on GitHub (v3.0-stable version), serving as a practical tool for AI safety and interpretability researchers.