Section 01
[Introduction] AssemLM: A Spatial Reasoning Multimodal Large Language Model for Robotic Assembly Tasks
AssemLM is proposed by China Telecom Artificial Intelligence Research Institute in collaboration with Fudan University, Tianjin University, Northwestern Polytechnical University, and City University of Hong Kong. It is a spatial reasoning multimodal large language model specifically designed for robotic assembly tasks. By integrating assembly manuals, point cloud data, and text instructions, it realizes the inference and prediction of key 6D assembly poses, and has achieved leading performance on the AssemBench benchmark with over 900,000 samples, providing an effective technical solution for the application of embodied intelligence in the field of industrial assembly.