Section 01
[Introduction] CrashChat: A Multimodal Large Language Model Focused on Traffic Accident Video Analysis
CrashChat is a multimodal large language model specifically designed for traffic accident video analysis, improved based on the VideoLLaMA3 architecture. It supports six core tasks including accident recognition, time localization, causal reasoning, and prevention recommendation generation. The project has built an instruction fine-tuning dataset containing 18,385 videos and 96,184 question-answer pairs. It has been accepted by the ICPR 2026 conference, and the code, model weights, and dataset have been open-sourced. It has application potential in multiple scenarios such as intelligent traffic monitoring and insurance claims settlement.