Zing Forum

Reading

V2X-QA: A Multimodal Large Model Reasoning Dataset and Benchmark for V2X Cooperative Autonomous Driving

V2X-QA is a real-scenario-based multi-view autonomous driving visual question answering dataset that supports controlled evaluation from three perspectives (vehicle-side, infrastructure-side, and cooperative), and it also releases a MoE baseline model V2X-MoE based on Qwen3-VL.

V2X自动驾驶多模态大模型车路协同视觉问答数据集Qwen3-VLMoE
Published 2026-04-06 11:12Recent activity 2026-04-06 11:18Estimated read 6 min
V2X-QA: A Multimodal Large Model Reasoning Dataset and Benchmark for V2X Cooperative Autonomous Driving
1

Section 01

[Introduction] Overview of V2X-QA Dataset and V2X-MoE Baseline Model

V2X-QA is a multimodal large model reasoning dataset and benchmark for V2X cooperative autonomous driving, built on real scenarios and supporting controlled evaluation from three perspectives: vehicle-side, infrastructure-side, and cooperative. The project simultaneously releases a MoE baseline model V2X-MoE based on Qwen3-VL, providing a new evaluation dimension for the application of multimodal large models in the autonomous driving field.

2

Section 02

Project Background and Core Positioning

Autonomous driving is shifting from single-vehicle intelligence to vehicle-infrastructure cooperation (V2X), and integrating vehicle-side close-range details with infrastructure-side global perception is a key challenge. Traditional datasets mostly focus on a single perspective. V2X-QA is built based on V2X-Seq-SPD, and for the first time integrates three perspectives—vehicle-side (VS), infrastructure-side (IS), and cooperative (CO)—into a unified VQA framework, enabling precise quantification of the contribution of different information sources to model reasoning.

3

Section 03

Dataset Architecture and Task Design

V2X-QA includes 12 view-aligned tasks covering three levels: perception (recognizing traffic participants/signs, etc.), prediction (trajectory inference), and reasoning planning (driving decision-making). Each task has evaluation subsets for the three perspectives, and data is stored in JSONL format (including questions, options, answers, and image paths). Original images need to be downloaded separately from the official V2X-Seq-SPD channel (due to license restrictions).

4

Section 04

V2X-MoE Baseline Model Design

V2X-MoE is a MoE model based on Qwen3-VL, using an explicit view routing mechanism and containing three LoRA expert modules for vehicle-side, infrastructure-side, and cooperative perspectives. During inference, the corresponding expert is activated according to the question's perspective, avoiding performance degradation of a single model adapting to multiple distributions. Training is divided into three stages: joint MCQA training → cooperative view fine-tuning → infrastructure-side view enhancement, ensuring a balance between general and specialized capabilities.

5

Section 05

Technical Implementation and Reproduction Guide

The project provides complete training/evaluation scripts (supporting Conda/venv environments). Training scripts are in the model/train/ directory (corresponding to the three stages), and the evaluation script v2x_moe_eval_mcqa_qwen3.py can directly load pre-trained checkpoints. The checkpoints include three expert LoRA weights and configurations, so users can reproduce results without training from scratch. Note: Annotation files, scripts, and checkpoints are maintained by the project; original images and base models need to be obtained in accordance with upstream agreements.

6

Section 06

Research Value and Application Prospects

V2X-QA fills the gap in the evaluation of V2X cooperative multimodal large models. Compared to traditional datasets, it emphasizes higher-level reasoning (understanding scenarios and making decisions), which aligns with the development trend of multimodal large models. In applications, it provides a standardized testing platform for algorithm iteration of V2X cooperative systems, and can evaluate vehicle-side perception optimization, roadside deployment strategies, cloud fusion algorithms, etc.

7

Section 07

Summary and Outlook

V2X-QA provides important infrastructure for V2X cooperative research through its multi-view VQA dataset and MoE baseline model. Its controlled evaluation design, modular architecture, and open-source implementation reflect a deep understanding of domain needs. For researchers/engineers, it is both a benchmark tool and a reference framework, and we look forward to the project's continuous iteration to contribute to the industry's development.