Zing Forum

Reading

V2V-GoT: A Vehicle-to-Vehicle Collaborative Autonomous Driving Framework Based on Multimodal Large Language Models and Graph-of-Thoughts

V2V-GoT is the first Graph-of-Thoughts reasoning framework designed specifically for vehicle-to-vehicle (V2V) collaborative autonomous driving. It integrates multi-vehicle perception information via multimodal large language models to achieve occluded perception and planning-aware prediction, outperforming baseline methods in collaborative perception, prediction, and planning tasks.

自动驾驶车车协同V2V通信多模态大语言模型思维图谱遮挡感知轨迹预测LLaVAICRA2026
Published 2026-04-02 04:08Recent activity 2026-04-02 04:17Estimated read 4 min
V2V-GoT: A Vehicle-to-Vehicle Collaborative Autonomous Driving Framework Based on Multimodal Large Language Models and Graph-of-Thoughts
1

Section 01

Introduction to the V2V-GoT Framework

V2V-GoT is the first Graph-of-Thoughts reasoning framework designed specifically for vehicle-to-vehicle (V2V) collaborative autonomous driving. It integrates multi-vehicle perception information via multimodal large language models to achieve occluded perception and planning-aware prediction, outperforming baseline methods in collaborative perception, prediction, and planning tasks.

2

Section 02

Background and Challenges

One of the core bottlenecks of autonomous driving technology lies in the physical limitations of single-vehicle perception systems, where occlusion issues easily lead to safety hazards. Vehicle-to-vehicle (V2V) communication can expand the field of view, but traditional methods use simple feature fusion, making it difficult to leverage semantic correlations of multi-source information and perform complex reasoning.

3

Section 03

Core Methods and Innovations

V2V-GoT introduces Graph-of-Thoughts structured reasoning, decomposed into associated QA nodes. Two key innovations: 1. Occluded perception: Identify occluded areas and infer occluded targets using information from other vehicles; 2. Planning-aware prediction: Predict the behavior of other participants by combining the ego vehicle's candidate trajectories.

4

Section 04

Dataset and Model Training

Constructed the V2V-GoT-QA dataset (based on V2V4Real, including multi-vehicle perception features and QA sequences); fine-tuned using LoRA technology on LLaVA 1.5 for 10 epochs to adapt to the V2V collaborative domain.

5

Section 05

Experimental Result Analysis

Outperforms baselines in collaborative perception, prediction, and planning tasks, with significant advantages in occlusion scenarios; Graph-of-Thoughts provides interpretability, allowing traceable reasoning paths and easy integration of domain knowledge such as traffic rules.

6

Section 06

Open Source Support and Future Outlook

The project is open-source (GitHub includes code, datasets, etc., with datasets hosted on Hugging Face); future extensions can include vehicle-to-infrastructure (V2I) and multimodal sensor fusion scenarios, providing an example for combining large model reasoning with physical perception.