Section 01
[Introduction] LMM-Track4D: A New Breakthrough in Unleashing 4D Dynamic Reasoning Capabilities of Multimodal Models
This article introduces the LMM-Track4D model, which addresses the capability gap of multimodal models in 4D (3D space + time) continuous spatiotemporal dynamic reasoning through a trajectory-anchored dialogue paradigm. The model integrates three core technologies: RTGE Ray-Time Geometric Encoding, TRK Long-Range Dynamic State Tokens, and OSK-RA Object Slot Kinematic Residual Anchoring Decoder, and releases the Track4D-Bench benchmark dataset, providing a systematic framework for evaluating 4D reasoning capabilities.