Zing Forum

Reading

Major Breakthrough in Large-Scale Codec Avatars Technology: High-Fidelity 3D Digital Humans via Million-Scale Pre-Training

Meta's latest research achievement, LCA, successfully applies large-scale pre-training to the 3D digital human domain for the first time through an innovative pre-training/post-training paradigm, resolving the long-standing conflict between high fidelity and generalization.

3D avatardigital humanpretrainingcomputer visiongenerative AICodec AvatarsMetavirtual realityAR/VR
Published 2026-04-03 01:58Recent activity 2026-04-03 11:18Estimated read 6 min
Major Breakthrough in Large-Scale Codec Avatars Technology: High-Fidelity 3D Digital Humans via Million-Scale Pre-Training
1

Section 01

Major Breakthrough in Large-Scale Codec Avatars Technology: High-Fidelity 3D Digital Humans via Million-Scale Pre-Training (Introduction)

Meta's latest research result, Large-Scale Codec Avatars (LCA), introduces the large-model pre-training paradigm into the 3D digital human domain for the first time. Through an innovative two-stage pre-training/post-training strategy, it resolves the long-standing conflict between high fidelity and generalization. This technology uses million-scale in-the-wild videos for pre-training to acquire general knowledge, and combines post-training on high-quality data to improve fineness. It enables efficient forward inference to generate high-fidelity full-body 3D digital humans, bringing new possibilities to fields such as VR/AR and remote collaboration.

2

Section 02

Background: The Dilemma of 3D Digital Human Modeling

High-fidelity 3D digital human modeling has long faced the trade-off problem between fidelity and generalization: Methods trained on studio data are rich in details but poor in generalization, making it difficult to adapt to diverse real-world scenarios; Models based on millions of in-the-wild samples have strong generalization capabilities but suffer from low quality and lack of realism due to 3D ambiguity. This is essentially a conflict between the scarcity of high-quality annotated data and the demand for diversity in the real world, which restricts the practical application of the technology.

3

Section 03

Method: LCA's Two-Stage Pre-Training/Post-Training Strategy

The LCA method proposed by Meta draws on large-model pre-training experience and adopts two-stage training: In the pre-training phase, it uses 1 million in-the-wild videos to learn general representations such as human body shape and facial structure, accumulating extensive priors; In the post-training phase, it fine-tunes on high-quality selected data, focusing on improving expressive ability and fidelity. This strategy combines the generalization advantages of large-scale data with the fine optimization of small-scale high-quality data, breaking through traditional limitations.

4

Section 04

Technical Highlights: Efficient Inference and Strong Control Capabilities

The core advantage of LCA lies in its forward inference generation method: a single pass can generate a high-fidelity full-body 3D digital human, greatly improving efficiency; It achieves precise fine-grained facial expression control and finger-level joint motion control, maintaining identity consistency while showing rich expressions and gestures; It also exhibits capabilities such as relighting, natural deformation of loose clothing, and zero-shot robustness to stylized images, reflecting the effect of deep general representation learning.

5

Section 05

Application Prospects: Practical Significance in Multiple Fields

The LCA technology brings new possibilities to fields such as VR/AR (personalized high-fidelity avatars), remote collaboration (transmitting non-verbal information to improve communication efficiency), and the entertainment industry (efficient generation of realistic characters); Its forward inference feature is suitable for edge device deployment, and real-time operation of high-fidelity digital human generation on consumer-grade devices is expected in the future.

6

Section 06

Limitations and Future Research Directions

LCA still has limitations: The cost of collecting and annotating million-scale pre-training data is high; Performance in extreme lighting and complex occlusion scenarios needs to be improved. Future directions include exploring more efficient data utilization (semi-supervised/self-supervised), improving real-time performance and computational efficiency, and extending the pre-training paradigm to more 3D content generation tasks (scene and object modeling).

7

Section 07

Conclusion: A New Stage of 3D Digital Human Technology

The introduction of LCA marks a new stage in 3D digital human technology. It successfully balances high fidelity and generalization, solves long-standing technical problems, and lays the foundation for future intelligent and realistic virtual interaction experiences. As the technology matures, high-fidelity digital humans are expected to play a more important role in daily life.