Section 01
CT-1 Model Core Guide: A Spatial Intelligence for Video Generation That Truly Understands Camera Motion
CT-1 is a joint vision-language-camera model that enables camera-controllable video generation aligned with user intent by transferring spatial reasoning knowledge to video generation tasks, and has released the CT-200K dataset containing 47 million frames. Its core is the two-stage paradigm of "Camera First, Generation Second", which solves the problems of ambiguous camera control and lack of spatial reasoning in existing video generation.