Section 01
[Introduction] TraceSafe: Key Points of Systematic Evaluation on Safety Guardrails for Multi-step Tool Calling Trajectories
This article focuses on the safety issues of intermediate trajectories in multi-step tool calling by LLM agents, filling the gap in domain evaluation. Key contributions include proposing the first trajectory-level safety benchmark TraceSafe-Bench (12 risk categories, 1000+ instances), and discovering three key patterns: guardrail effectiveness depends on structured data capabilities rather than semantic alignment; model architecture is more important than scale; accuracy improves with execution steps.