Section 01
SilentBench: A Systematic Benchmark Revealing the "Output Suppression" Phenomenon in Large Language Models (Introduction)
SilentBench is the first open-source benchmark dedicated to studying the "output suppression" phenomenon in large language models. By comparing base models with instruction-tuned models, it reveals that RLHF training produces consistent suppression signatures across specific categories. This article will discuss aspects including background, methodology, evidence, conclusions, and future directions.