Section 01
Archon: A Unified Multimodal Model for Holistic Digital Human Generation
Archon is a human-centric unified multimodal model developed by ZJU 3DV Lab (arXiv, 2026). It integrates seven modalities (text, audio, action, facial expression, mouth movement, image, video) and innovates with semantic video reparameterization and modality thinking chain to achieve end-to-end high-quality holographic digital human generation. This model addresses the limitations of existing modular digital human solutions and key technical challenges in the field.