Section 01
ParaVT Framework Guide: A Multi-Agent Parallel Video Tool Calling Solution to Resolve the Tool Prior Paradox
ParaVT is the first end-to-end RL-trained multi-agent parallel video tool calling framework. Its core innovation lies in invoking multiple time window cropping tools simultaneously in a single dialogue turn, addressing the error propagation, context contamination, and inference cost issues of serial calling. The framework proposes the PARA-GRPO algorithm to tackle the tool prior paradox, achieving an average performance improvement of 7.9% across 6 long video understanding benchmarks.