# ATLAS: A New Paradigm Unifying Agentic and Implicit Visual Reasoning with a Single Token

> The ATLAS framework unifies agentic reasoning and implicit visual reasoning into a single discrete token via "functional tokens". It avoids external execution latency while retaining interpretability, and introduces LA-GRPO for stable training.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-14T17:59:55.000Z
- 最近活动: 2026-05-15T17:18:33.533Z
- 热度: 131.7
- 关键词: 视觉推理, 多模态大模型, 功能词元, ATLAS, GRPO, 强化学习, 代理式AI, 隐式推理, 词元预测, 可解释AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/atlas
- Canonical: https://www.zingnex.cn/forum/thread/atlas
- Markdown 来源: floors_fallback

---

## ATLAS Framework: A New Paradigm Unifying Agentic and Implicit Visual Reasoning with Functional Tokens

The ATLAS framework is a new visual reasoning paradigm proposed by institutions including the Chinese University of Hong Kong and Shanghai Artificial Intelligence Laboratory. Its core innovation is unifying agentic reasoning and implicit visual reasoning into a single discrete token via **functional tokens**. This design eliminates the external execution latency of agentic reasoning while retaining interpretability; it also introduces the **LA-GRPO algorithm** to solve the sparsity problem in functional token training, achieving a win-win between performance and interpretability.

## Background: The Dilemma of Visual Reasoning

Visual reasoning needs to handle intermediate visual states, but the two existing technical routes have limitations:
- **Agentic reasoning**: Manipulates visual content via code/external tools, with strong interpretability but high context switching overhead and slow reasoning speed;
- **Implicit reasoning**: Uses internal hidden embeddings to represent visual states, fast but lacks generalization ability and is difficult to be compatible with autoregressive parallel training.

## Core of ATLAS: Threefold Design of Functional Tokens

Functional tokens are the core of ATLAS, with a threefold design:
1. **Internalized visual operations**: Associates internal visual operations (e.g., rotation, zooming) without external tools, eliminating latency;
2. **Standard token attributes**: Belongs to the tokenizer vocabulary, can be generated via standard token prediction without modifying the model architecture;
3. **No visual supervision needed**: Automatically learned through end-to-end task objectives (e.g., correctness of question answering) without explicit visual annotations.

## LA-GRPO: Key Algorithm to Solve Sparsity in Functional Token Training

Functional token training faces sparsity challenges in the early stage (extremely small proportion, weak gradient signals). The LA-GRPO algorithm introduces **statically weighted auxiliary objectives** and sets anchor loss terms for functional tokens. Even if there are few functional tokens in a batch, it can provide stable gradients, retaining the sample efficiency of GRPO while solving the training instability problem.

## Experimental Validation: Performance of ATLAS on Multiple Tasks

ATLAS performs excellently on multiple visual reasoning benchmarks:
- **Geometric reasoning**: In precise spatial relationship judgment tasks, functional tokens clearly show the reasoning process;
- **Visual question answering**: In complex multi-step reasoning QA tasks, it leads in accuracy and can explain logic via functional token sequences;
- **Baseline comparison**: The reasoning latency is reduced by an order of magnitude compared to pure agentic methods, and its generalization ability and training stability are better than pure implicit methods.

## Technical Significance and Future Directions: Discrete Tokens Connecting Symbolic and Neural Reasoning

The significance of ATLAS lies in revealing that discrete tokens can serve as a bridge between symbolic reasoning and neural computing, unifying agentic (symbolic, interpretable) and neural reasoning (continuous, efficient). Future prospects include:
1. **Internalization of tool learning**: Internalize common tool functions into functional tokens;
2. **Unified multi-modal representation**: Use functional tokens as multi-modal operation interfaces;
3. **Enhanced interpretability**: Discrete tokens make the reasoning process transparent, suitable for high-risk scenarios.

## Resource Links: ATLAS Open-Source Code and Paper Addresses

The ATLAS project code has been open-sourced: <https://github.com/ZiyuGuo99/ATLAS>
Paper link: <https://arxiv.org/abs/2605.15198>