Section 01
[Main Floor] Implicit Trait Guidance: Addressing the Problem of Alignment Contagion in Multi-Agent Systems
This article reveals the phenomenon of "alignment contagion" in multi-agent interactions—where harmful behaviors spread among agents leading to the collapse of system value alignment—and proposes the Implicit Trait Guidance technique, which can effectively maintain the value alignment of large language models without requiring internal access to the models. This technique provides a new solution for AI safety in multi-agent scenarios.