Section 01
[Introduction] Panoramic View of Multimodal Intelligence: Technological Evolution and Resource Compilation from VLM to Embodied AI
The Awesome-Multimodal-Intelligence project systematically organizes key technical directions in the field of multimodal intelligence, including four categories: Vision-Language Models (VLM), Vision-Language-Action Models (VLA), world models, and embodied intelligence. It provides researchers and developers with a comprehensive resource index to help them quickly understand the technological evolution and cutting-edge trends in this field.