Zing Forum

Reading

CausalDriveBench: A Causal Reasoning Evaluation Benchmark and Dataset Construction Framework for Autonomous Driving

A comprehensive benchmark for evaluating the causal reasoning capabilities of vision-language-action models in autonomous driving scenarios, supporting the nuScenes, OpenScene, and Argoverse V2 datasets, and providing a complete pipeline from raw data to causal scene graphs, question-answer pairs, and counterfactual trajectories.

自动驾驶因果推理视觉语言动作模型nuScenesOpenSceneArgoverse基准测试反事实轨迹因果场景图ECCV2024
Published 2026-05-02 02:39Recent activity 2026-05-02 02:53Estimated read 6 min
CausalDriveBench: A Causal Reasoning Evaluation Benchmark and Dataset Construction Framework for Autonomous Driving
1

Section 01

[Introduction] CausalDriveBench: Project Overview of Causal Reasoning Evaluation Benchmark for Autonomous Driving

CausalDriveBench is a causal reasoning evaluation benchmark for vision-language-action (VLA) models in autonomous driving, supporting three mainstream datasets: nuScenes, OpenScene, and Argoverse V2. It provides a complete construction pipeline from raw data to causal scene graphs, question-answer pairs, and counterfactual trajectories, aiming to fill the gap in causal reasoning evaluation in the autonomous driving field.

2

Section 02

Project Background and Research Motivation

The safety of autonomous driving systems depends not only on the accuracy of perception and planning but more crucially on understanding the causal relationships between scene elements. Current end-to-end VLA models perform well in regular scenarios but often struggle when facing complex situations that require deep causal reasoning. CausalDriveBench is a research project born to address this evaluation gap.

3

Section 03

Core Capabilities and Technical Architecture

Supported Datasets

  • nuScenes: 120 scenes, ~4 samples per scene
  • OpenScene (NAVSIM): 100 scenes, using quartile sampling
  • Argoverse V2: 133 scenes, 5-camera configuration

Six-Stage Pipeline

  1. Record Construction: Convert raw data into a unified structure containing BEV rendering, multi-view images, agent states, etc.
  2. Causal Scene Graph Generation: Use multi-modal LLMs to generate structured graphs with 5 node types, multiple edge types, and causal states.
  3. Graph Pruning: Reverse BFS algorithm to remove interfering nodes with no causal path to the ego vehicle.
  4. Causal Ladder QA: Generate three types of questions (active edges, dormant nodes, interfering nodes) based on Pearl's theory.
  5. Counterfactual Trajectory Generation: Generate counterfactual scenarios such as agent intervention and infrastructure intervention for specific questions.
  6. LLM Ego Vehicle Trajectory Prediction: Predict ego vehicle trajectory based on intervention configurations; optional nuPlan simulator can be used as an alternative.
4

Section 04

Technical Implementation Details

  • Batch API Cost Optimization: Use Claude Batch API, with a single sample processing cost of ~$0.16-$0.25
  • Dynamic Camera Sorting: Dynamically construct IMAGE_ORDER_BLOCK for camera differences across datasets, no need for multiple prompt sets
  • Visibility Filtering: Apply multi-ray 3D ray casting to filter occluded vehicles in nuScenes data
  • Image Size Adaptation: Automatically adjust image size when AV2 camera images exceed Claude's limits
5

Section 05

Visualization and Validation Tools

  • Interactive Visualization: HTML tool based on D3.js, which can render causal graphs, overlay camera images and BEV, display QA cards, and support switching between original and pruned graphs
  • Validation Script: Graph post-processing script for manual review and correction, generating {scene_id}_verified.json as the standard graph
6

Section 06

Research Value and Application Prospects

CausalDriveBench fills the gap in causal reasoning evaluation in the autonomous driving field and can be used for:

  1. Quantitatively compare the causal reasoning levels of different VLA models
  2. Identify model failure points in causal scenarios
  3. Expand training sets based on counterfactual trajectories to improve robustness
  4. Understand model decision-making basis through causal graph visualization

This benchmark promotes the paradigm shift of autonomous driving from 'pattern recognition' to 'causal understanding'.