Zing Forum

Reading

ProbeLogits: A New Security Paradigm for Integrating LLM Reasoning into OS Kernels

This article introduces ProbeLogits, a security mechanism that directly reads LLM logits at the kernel layer for action classification. It achieves high-precision governance without text generation, laying a new security foundation for AI-native operating systems.

ProbeLogitsLLM推理操作系统内核AI安全logits分类Anima OS内核级治理零学习参数校准强度WASM沙箱
Published 2026-04-14 02:32Recent activity 2026-04-15 10:19Estimated read 7 min
ProbeLogits: A New Security Paradigm for Integrating LLM Reasoning into OS Kernels
1

Section 01

ProbeLogits: Guide to the New Security Paradigm of LLM Reasoning at the Kernel Layer

ProbeLogits: Guide to the New Security Paradigm of LLM Reasoning at the Kernel Layer

ProbeLogits is a security mechanism that directly embeds LLM reasoning into the operating system kernel. Its core lies in reading the model's logit distribution (probability distribution before text generation) at the kernel layer for security classification, enabling high-precision governance without text generation. It solves the problems of high latency (cumulative latency over 650ms) and easy bypassing in traditional application-layer security checks, laying a brand-new security foundation for AI-native operating systems.

2

Section 02

Background: Architectural Bottlenecks in AI Security

Background: Architectural Bottlenecks in AI Security

Current LLM security governance faces architectural dilemmas: Traditional practices require constructing prompts → waiting for model text generation → parsing responses to make decisions, involving multiple steps leading to high latency (over 650ms). More seriously, application-layer security filters and constraint agents are at the same privilege level, so malicious agents can modify or disable filters to bypass checks.

3

Section 03

Core Technical Methods of ProbeLogits

Core Technical Methods of ProbeLogits

Technical Principles

For a given prompt, the model outputs a logit vector. For binary classification, compute the constrained softmax of the "Yes" and "No" tokens: P(Yes)=exp(logit_Yes)/(exp(logit_Yes)+exp(logit_No)), which only requires one forward pass and no text generation or parsing.

Three Kernel Primitives

  1. probe_yes_no: Binary classification for security judgment, returns category and confidence (0.5-1.0);
  2. probe_classify: N-way classification, labels correspond to a single token in the vocabulary;
  3. text_to_id: Converts text to token ID, implemented with BTreeMap for O(log|V|) lookup.
4

Section 04

Performance and Policy Adjustment

Performance and Policy Adjustment

Benchmark Test Results

  • OS action classification: 260-prompt benchmark (9 categories including adversarial attacks), 7B model (4-bit quantization) achieves F1=0.980, precision=1.000, recall=0.960 with zero learning parameters;
  • ToxicChat detection: 1000 real conversation dataset, F1=0.790 at α=1.0, improved to 0.837 at α=0.5 (reaching 89% of Llama Guard3's performance);
  • Latency comparison: Traditional methods take 650ms, ProbeLogits only takes 65ms (7B model), reducing latency by 10 times.

Calibration Strength α

α is a deployment policy adjustment knob: Strict policy (α≥0.8) maximizes recall (suitable for privileged operations), loose policy (α=0.5) maximizes precision (suitable for dialogue agents). Contextual calibration increases accuracy from 64.8% to 97.3%.

5

Section 05

Kernel-Level Security and Robustness Guarantees

Kernel-Level Security and Robustness Guarantees

Kernel-Level Enforcement

In Anima OS (a bare-metal x86_64 system written in Rust), agent actions must be executed through kernel-mediated host functions. ProbeLogits checks run under the WASM sandbox boundary; even if a malicious agent escapes the sandbox, it cannot bypass the checks.

Robustness Guarantees

  1. No parsing failure: Output is a floating-point probability in [0,1], no need to parse text;
  2. Bounded confidence: Binary classification confidence ranges from 0.5 to 1.0, N-way classification is 1/N;
  3. Graceful degradation: When the model is uncertain, confidence approaches 0.5/1/N; models can be upgraded or manual judgment applied;
  4. Numerical stability: Log-sum-exp prevents overflow, uniform fallback prevents division by zero, f64 accumulation prevents precision loss.
6

Section 06

Implementation and Future Outlook

Implementation and Future Outlook

Implementation Status

ProbeLogits is fully implemented in Anima OS: SmolLM2-135M achieves 1666 tokens/second (1.39 times faster than llama.cpp), Qwen2.5-7B achieves 15 tokens/second (on par with llama.cpp when DDR5 is saturated); Classification cost: 135M model takes 0.6ms, 7B model takes 65ms, supporting real-time governance.

Significance and Outlook

ProbeLogits realizes a paradigm shift in AI security architecture: from application layer to kernel layer, from text generation to direct signal reading, from probabilistic detection to structural enforcement. It proves that OS can directly understand and govern the internal state of AI workloads, providing a technical path for AI-native OS. It may become a standard OS configuration in the future.