# CAPruner: Enhancing 3D Spatial Reasoning of Large Language Models via Concept-Adjacent Scene Graph Pruning

> CAPruner is a novel scene graph pruning method that improves the performance of large language models (LLMs) in 3D spatial reasoning tasks by identifying and leveraging concept-adjacent relationships. This method effectively filters redundant information and helps models focus on key spatial relationships.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-03T13:38:26.000Z
- 最近活动: 2026-05-03T13:49:06.600Z
- 热度: 146.8
- 关键词: 3D空间推理, 场景图剪枝, 大语言模型, 视觉问答, 概念相邻性, 多模态学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/capruner-3d
- Canonical: https://www.zingnex.cn/forum/thread/capruner-3d
- Markdown 来源: floors_fallback

---

## [Introduction] CAPruner: A New Method to Enhance 3D Spatial Reasoning of Large Language Models

CAPruner is a scene graph pruning method based on concept adjacency, designed to address the problem where redundant information overwhelms key relationships when large language models (LLMs) handle 3D spatial reasoning tasks. By intelligently identifying and retaining scene elements semantically adjacent to the query concept, this method effectively filters redundant information and improves the model's reasoning performance and efficiency.

## Background and Challenges: Pain Points in 3D Spatial Reasoning

Large language models have made significant progress in natural language processing, but they face challenges when handling complex 3D spatial reasoning: 3D scenes contain a large number of objects and relationships, forming complex scene graphs. Directly inputting these into LLMs can easily lead to redundant information interfering with key relationships. Existing methods mostly use simple heuristics or random sampling for pruning, which lack targeting and may remove nodes of key relationships. How to intelligently retain valuable information has become a key issue.

## Core Idea and Technical Approach

### Core Idea
CAPruner's core is to use "concept adjacency": in 3D reasoning, scene elements semantically adjacent to the query concept are more informationally valuable (e.g., when answering "What is next to the sofa?", a coffee table or carpet is more relevant than a refrigerator).

### Technical Approach
1. **Scene Graph Representation and Encoding**: Convert 3D scenes into structured scene graphs (nodes are objects, edges are spatial relationships), and obtain semantic embeddings via a vision-language encoder.
2. **Concept Adjacency Measurement**: Calculate node importance scores by integrating semantic similarity (similarity to the query), topological proximity (graph centrality and connectivity), and relationship path weight (shorter and clearer paths have higher weights).
3. **Adaptive Pruning**: Dynamically adjust the pruning threshold based on query complexity and scene graph density—aggressive pruning for simple queries, and retaining more context for complex tasks.

## Experimental Validation: Significant Performance Improvement

CAPruner performed excellently in 3D-VQA benchmark tests such as ScanNet and 3DSSG:
- Reasoning accuracy increased by 8-15 percentage points compared to the baseline;
- Scene graph nodes were reduced by 60-70% after pruning, while retaining key information;
- Model processing time decreased by approximately 40%;
- It performed prominently in multi-hop reasoning (e.g., "Can the person sitting on the sofa see the TV?"), effectively retaining intermediate reasoning nodes.

## Practical Application Value: Multi-Domain Scenarios

CAPruner can be applied in:
- **Smart Home and Robot Navigation**: Helping service robots understand spatial layouts and execute complex instructions;
- **AR/VR**: Accurately understanding 3D scene relationships to support reasonable placement and interaction of virtual objects;
- **Autonomous Driving**: Assisting in understanding spatial relationships in traffic scenarios (e.g., judging overtaking space).

## Limitations and Future Directions

CAPruner has the following directions to explore:
1. **Dynamic Scene Processing**: Currently targeting static scenes, it needs to adapt to dynamic 3D environments (e.g., moving objects);
2. **Cross-Modal Fusion**: Deepen the fusion of visual, language, and depth information to improve understanding of complex spatial relationships;
3. **Zero-Shot Generalization**: Enhance the model's generalization ability to unseen scenes and object categories.

## Conclusion: Significance and Prospects of CAPruner

CAPruner provides an efficient scene graph processing method for 3D spatial reasoning through the concept adjacency pruning criterion, significantly improving the performance of LLMs in 3D-VQA tasks and providing ideas for the development of multi-modal large models. With the development of embodied intelligence and robot technology, such spatial reasoning enhancement technologies will play an important role in practical applications.