# 3D-CoS: A New Paradigm for 3D Reconstruction Based on Visual Language Model Code Synthesis

> 3D-CoS proposes a new paradigm of generating 3D assets as executable Blender code. It improves generation quality through blueprint planning, RAG retrieval, few-shot demonstrations, and a component-level agent workflow, and demonstrates unique advantages in editability and local modification.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-09T06:46:29.000Z
- 最近活动: 2026-06-10T03:57:58.748Z
- 热度: 129.8
- 关键词: 3D重建, 代码合成, 视觉语言模型, Blender, 程序化建模, 可编辑性, RAG, 3D内容生成
- 页面链接: https://www.zingnex.cn/en/forum/thread/3d-cos-3d
- Canonical: https://www.zingnex.cn/forum/thread/3d-cos-3d
- Markdown 来源: floors_fallback

---

## 3D-CoS: Guide to the New 3D Reconstruction Paradigm Based on VLM Code Synthesis

**Core Insights**
3D-CoS (3D Code Synthesis) proposes a new paradigm of generating 3D assets as executable Blender code. It improves generation quality through blueprint planning, RAG retrieval, few-shot demonstrations, and a component-level agent workflow, and demonstrates unique advantages in editability and local modification.
**Source Information**
- Original Authors: arXiv authors
- Source Platform: arXiv
- Original Title: 3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis
- Link: http://arxiv.org/abs/2606.10478v1
- Publication Date: 2026-06-09

## Limitations of Traditional 3D Representations

Current mainstream 3D reconstruction and editing systems rely on implicit (e.g., NeRF) or explicit (e.g., point clouds, meshes) representations. While they offer high rendering fidelity, they have fundamental limitations:
- **Mesh**: Requires direct manipulation of vertices/faces, has a high technical threshold, and lacks semantic interpretability (e.g., modifying chair legs requires locating corresponding vertices);
- **NeRF**: Modification requires retraining or complex inverse rendering, making precise local adjustments difficult;
- **Point Cloud**: Lacks topological information, editing easily leads to geometric discontinuities.

## 3D Code Synthesis: Programmable 3D Representation

3D-CoS represents 3D assets as Blender-executable Python code, bringing three key advantages:
1. **Interpretability**: Code has clear semantics, allowing humans to understand the 3D object construction process;
2. **Editability**: Adjusting code parameters (e.g., cylinder height) enables precise shape control;
3. **Local Modification Capability**: Directly modify the code of the corresponding component (e.g., chair leg length) without affecting other parts, solving the pain points of traditional representations.

## Structured Code Synthesis Workflow

The research team designed a structured workflow to guide VLMs in generating code:
- **Blueprint Planning**: First generate a sequence of high-level steps (e.g., chair construction: seat → legs → backrest → assembly);
- **RAG Enhancement**: Retrieve Blender API documentation during generation to improve code correctness;
- **Few-shot Demonstrations**: Provide examples of complex geometric operations to help VLMs map geometric concepts to code;
- **Component-level Agents**: Divide and conquer—each component is generated by a dedicated agent, and the main agent coordinates assembly relationships.

## Advantages of Local Text-Driven Editing (Experimental Evidence)

Experimental comparison between code and point cloud editing:
- **Point Cloud Editing**: Needs to identify the point set corresponding to chair legs, which easily leads to cracks at joints or accidental damage to adjacent components;
- **Code Editing**: Directly modify chair leg code parameters—semantically clear, precise, and maintains geometric continuity;
Results show: Code representation is superior in edit fidelity and preservation of unedited areas.

## Exploration of VLM Capability Boundaries

Evaluation of VLMs (open-source like LLaVA/Qwen-VL, closed-source like GPT-4V/Claude) capability boundaries:
- **Limitations**: Insufficient understanding of 3D spatial relationships, difficulty handling complex topologies, incomplete mastery of Blender API;
- **Effect of Enhancement Strategies**: RAG improves API task accuracy; few-shot examples help with geometric learning; blueprint planning optimizes the rationality of generated structures.

## Application Prospects and Future Research Directions

**Application Prospects**:
- Content Creation: Natural language-generated prototypes + code refinement;
- Industrial Design: Parametric product definition, easy version control;
- Education: Interpretable 3D modeling learning materials.
**Future Directions**:
- Extend to platforms like Maya/3ds Max;
- Support complex operations like surface modeling and physical simulation;
- Optimize VLM architecture for 3D code generation;
- Integrate code representation with other 3D representations like NeRF and Gaussian splatting.
