Zing Forum

Reading

P3D-Bench: A Parametric 3D Generation Benchmark for Multimodal Large Models

P3D-Bench is the first comprehensive benchmark for parametric 3D generation, evaluating the ability of multimodal large models (MLLMs) to generate precise geometry, semantically aligned components, and structural assemblies via code, covering three core tasks: text-to-3D, image-to-3D, and assembly generation.

多模态大模型3D生成参数化建模代码生成基准测试装配体几何推理CAD
Published 2026-06-10 01:36Recent activity 2026-06-10 11:55Estimated read 7 min
P3D-Bench: A Parametric 3D Generation Benchmark for Multimodal Large Models
1

Section 01

P3D-Bench: The First Comprehensive Benchmark for Parametric 3D Generation of Multimodal Large Models

P3D-Bench is the first comprehensive benchmark for parametric 3D generation, aiming to evaluate the ability of multimodal large models (MLLMs) to generate precise geometry, semantically aligned components, and structural assemblies via code, covering three core tasks: text-to-3D, image-to-3D, and assembly generation.

Original Author/Maintainer: arXiv authors Source Platform: arxiv Original Title: P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning Original Link: http://arxiv.org/abs/2606.11152v1 Release Time: 2026-06-09T17:36:34Z

2

Section 02

Paradigm Shift in 3D Generation: From Implicit Representations to Parametric Code Generation

The code generation capability of multimodal large language models opens up new paths for 3D modeling. Traditional 3D generation methods output implicit representations such as meshes or point clouds, which can present appearance but lack interpretability and editability. Parametric 3D generation via code has advantages: readable code, explicit geometric parameters, and intuitive modification iterations.

However, this places higher demands on models: they need to generate runnable code while ensuring precise geometry, semantic alignment with input, and reasonable assembly relations—truly testing the model's understanding of design structure rather than just imitating appearance.

3

Section 03

Core Evaluation Task Families of P3D-Bench

The evaluation system of P3D-Bench covers three core task families:

  1. Text-to-3D: Generate parametric 3D code from natural language descriptions, testing semantic understanding, geometric mapping, and code writing abilities;
  2. Image-to-3D: Infer 3D structures and parametric representations from single or multiple images, requiring visual understanding and spatial reasoning abilities;
  3. Assembly Generation: Generate multi-component assemblies and handle inter-component assembly relations, directly testing combinatorial reasoning and structured generation abilities.
4

Section 04

Multi-Dimensional Evaluation Metrics of P3D-Bench

P3D-Bench designs seven-dimensional metrics to comprehensively evaluate generation quality:

  1. Executability: Whether the code runs without errors;
  2. Geometric Fidelity: Use metrics like chamfer distance to measure the similarity between the generated geometry and the target;
  3. Topological Correctness: Check for issues such as non-manifold edges and self-intersecting faces;
  4. Text Constraint Satisfaction: Whether explicit constraints (e.g., size, proportion) in the input description are satisfied;
  5. Multi-View Semantic Alignment: Semantic consistency between rendered images from different views and the input;
  6. Component-Level Structure: Whether the component geometry, quantity, and assembly relations of multi-component models are reasonable.
5

Section 05

Evaluation Findings: Three Key Shortcomings of Current Models

The research team evaluated cutting-edge models on 400 text cases, 400 image cases, and 203 annotated assemblies, leading to three key findings:

  1. Assembly task is the most challenging: Models lack combinatorial generalization ability and struggle to grasp spatial relations and assembly logic between components;
  2. Gap between global and precise geometry: Models can recover the overall shape, but there are deviations in precise parameters (e.g., chair leg length, seat size);
  3. Weak component-level modeling: In assembly tasks, both component geometry and quantity are inaccurate, and models lack understanding of complex hierarchical structures.
6

Section 06

Technical Insights and Future Research Directions

Findings from P3D-Bench point to directions for improvement:

  • Strengthen geometric reasoning ability to understand and execute parametric constraints;
  • Enhance combinatorial generation ability, especially for multi-component assembly scenarios;
  • Establish a closed-loop feedback mechanism between code generation and geometric verification.

For researchers: Provides a standardized evaluation platform for fair comparison of different methods; For industry: Reveals the potential and limitations of MLLMs in scenarios such as CAD, BIM, and game asset generation.

P3D-Bench lays the evaluation foundation for the parametric 3D generation field, promoting research towards precision, controllability, and structured development.