Section 01
DocSeeker Framework: Core Solution to Tackle Long Document Understanding Challenges
The DocSeeker framework addresses the low signal-to-noise ratio and weak supervision signals of multimodal large models in long document understanding through a three-stage workflow of "Analysis-Localization-Reasoning" and a two-stage training strategy, enabling robust generalization from short-document training to ultra-long documents. This framework focuses on structured visual reasoning and evidence localization, providing an effective technical path for long document processing.