Zing Forum

Reading

Southeast University Team Proposes New Streaming Safety Detection Method: SPRT Framework Enables Real-Time Toxic Content Interception for LLMs

The research team from Southeast University proposes a streaming safety detection framework based on Sequential Probability Ratio Test (SPRT), which can detect toxic content in real time during LLM generation, achieving 77%-96% token savings and marking an important breakthrough in the field of AI safety.

SPRTLLM安全流式检测序列假设检验东南大学毒性检测AI安全实时检测统计学习开源
Published 2026-04-04 07:45Recent activity 2026-04-04 07:49Estimated read 6 min
Southeast University Team Proposes New Streaming Safety Detection Method: SPRT Framework Enables Real-Time Toxic Content Interception for LLMs
1

Section 01

[Main Post/Introduction] Southeast University's SPRT Streaming Framework: A New Breakthrough in Real-Time Toxic Content Interception for LLMs

The research team from Southeast University proposes a streaming safety detection framework based on Sequential Probability Ratio Test (SPRT), which can detect toxic content in real time during the generation process of Large Language Models (LLMs), achieving 77%-96% token savings. This framework has a complete theoretical foundation, can strictly control the boundaries of false positives and false negatives, and has been open-sourced, marking an important breakthrough in the field of AI safety.

2

Section 02

Background: Real-Time Detection Needs and Challenges in AI Safety

With the improvement of LLM capabilities, the safety of generated content has become a focus. The traditional 'post-generation detection' model has limitations: when generating long texts, users may be exposed to harmful content in advance, and it wastes computing resources. Streaming detection (real-time monitoring and interception during generation) is a solution direction, but it needs to balance detection accuracy and early judgment, while controlling the boundaries of false positives and false negatives.

3

Section 03

Method: Core Mechanism of the SPRT Framework

The Southeast University team proposes the Contextual SPRT framework, whose core is cumulative log-likelihood ratio monitoring: for each generated token, calculate the probability ratio of it belonging to toxic/safe content and accumulate the log ratio, then make a judgment when the preset threshold is reached. Theoretically, this method can control the false positive rate (α ≤ 0.05) and false negative rate (β ≤ 0.10). In addition, it adaptively adjusts through the prior probability parameter π to handle scenarios with unbalanced proportions of toxic content.

4

Section 04

Experimental Evidence: Performance Validated on Four Datasets

The team tested on four datasets:

  • CivilComments (5000 entries, 8.0% toxic rate)
  • BeaverTails (3021 entries, 57.4% toxic rate)
  • PKU-SafeRLHF (3000 entries, 58.3% toxic rate)
  • Qwen3GuardTest (651 entries, 100% toxic rate) The results show a token savings rate of 77.3%-96.1%, and the F1 score on Qwen3GuardTest reaches 100%, demonstrating excellent performance.
5

Section 05

Technical Implementation and Open-Source Contribution

The team has open-sourced the complete implementation, with core components including:

  1. SPRTDetector class: encapsulates the SPRT algorithm logic for easy integration;
  2. Calibration module: uses temperature scaling technology to calibrate classifier outputs;
  3. Experimental framework: provides experimental scripts and analysis tools. Sample code allows quick integration of the detector and supports streaming detection.
6

Section 06

Practical Significance and Application Prospects

This framework fills the gap in streaming safety detection technology, and open-sourcing reduces the entry barrier. Application scenarios include:

  • Online content moderation: real-time interception of harmful content;
  • Model safety assessment: red team testing tool;
  • Training data filtering: quick filtering of toxic samples;
  • Interactive AI systems: ensuring real-time safety of chatbots and others.
7

Section 07

Conclusion: Statistical Learning Theory Empowers AI Safety

The work of the Southeast University team demonstrates the potential of statistical learning theory in the field of AI safety. The SPRT framework has both theoretical guarantees and practical value, and its open-source implementation promotes the popularization of the technology, providing a theoretical and practical foundation for building safer and more reliable AI systems.