Section 01
SentGuard: Sentence-Level Streaming Guard Solves LLM Real-Time Security Moderation Challenges
SentGuard proposes a sentence-level streaming content moderation solution. It detects security risks at sentence boundaries using a lightweight waiting buffer, achieving a 90.5% detection rate and a 7.41% false positive rate across 5 security benchmarks, balancing the timeliness and accuracy of moderation in streaming generation scenarios.