Section 01
[Introduction] IPO-Mine: Release of a Long-Text Multimodal IPO Document Analysis Toolkit and Dataset
This article introduces the IPO-Toolkit open-source framework and the IPO-Dataset. The dataset covers over 109,000 IPO filing documents and amendments from 1994 to 2026, including more than 76,000 images. The study reveals that current multimodal models have significant discrepancies with human experts' judgments when processing ultra-long regulatory documents, providing an important benchmark for multimodal reasoning research on financial documents.