Reading

ATLAS: A New Paradigm Unifying Agentic and Implicit Visual Reasoning with a Single Token

The ATLAS framework unifies agentic reasoning and implicit visual reasoning into a single discrete token via "functional tokens". It avoids external execution latency while retaining interpretability, and introduces LA-GRPO for stable training.

视觉推理多模态大模型功能词元ATLASGRPO强化学习代理式AI隐式推理词元预测可解释AI

Published 2026-05-15 01:59Recent activity 2026-05-16 01:18Estimated read 6 min

ATLAS: A New Paradigm Unifying Agentic and Implicit Visual Reasoning with a Single Token

Section 01

ATLAS Framework: A New Paradigm Unifying Agentic and Implicit Visual Reasoning with Functional Tokens

The ATLAS framework is a new visual reasoning paradigm proposed by institutions including the Chinese University of Hong Kong and Shanghai Artificial Intelligence Laboratory. Its core innovation is unifying agentic reasoning and implicit visual reasoning into a single discrete token via functional tokens. This design eliminates the external execution latency of agentic reasoning while retaining interpretability; it also introduces the LA-GRPO algorithm to solve the sparsity problem in functional token training, achieving a win-win between performance and interpretability.

Section 02

Background: The Dilemma of Visual Reasoning

Visual reasoning needs to handle intermediate visual states, but the two existing technical routes have limitations:

Agentic reasoning: Manipulates visual content via code/external tools, with strong interpretability but high context switching overhead and slow reasoning speed;
Implicit reasoning: Uses internal hidden embeddings to represent visual states, fast but lacks generalization ability and is difficult to be compatible with autoregressive parallel training.

Section 03

Core of ATLAS: Threefold Design of Functional Tokens

Functional tokens are the core of ATLAS, with a threefold design:

Internalized visual operations: Associates internal visual operations (e.g., rotation, zooming) without external tools, eliminating latency;
Standard token attributes: Belongs to the tokenizer vocabulary, can be generated via standard token prediction without modifying the model architecture;
No visual supervision needed: Automatically learned through end-to-end task objectives (e.g., correctness of question answering) without explicit visual annotations.

Section 04

LA-GRPO: Key Algorithm to Solve Sparsity in Functional Token Training

Functional token training faces sparsity challenges in the early stage (extremely small proportion, weak gradient signals). The LA-GRPO algorithm introduces statically weighted auxiliary objectives and sets anchor loss terms for functional tokens. Even if there are few functional tokens in a batch, it can provide stable gradients, retaining the sample efficiency of GRPO while solving the training instability problem.

Section 05

Experimental Validation: Performance of ATLAS on Multiple Tasks

ATLAS performs excellently on multiple visual reasoning benchmarks:

Geometric reasoning: In precise spatial relationship judgment tasks, functional tokens clearly show the reasoning process;
Visual question answering: In complex multi-step reasoning QA tasks, it leads in accuracy and can explain logic via functional token sequences;
Baseline comparison: The reasoning latency is reduced by an order of magnitude compared to pure agentic methods, and its generalization ability and training stability are better than pure implicit methods.

Section 06

Technical Significance and Future Directions: Discrete Tokens Connecting Symbolic and Neural Reasoning

The significance of ATLAS lies in revealing that discrete tokens can serve as a bridge between symbolic reasoning and neural computing, unifying agentic (symbolic, interpretable) and neural reasoning (continuous, efficient). Future prospects include:

Internalization of tool learning: Internalize common tool functions into functional tokens;
Unified multi-modal representation: Use functional tokens as multi-modal operation interfaces;
Enhanced interpretability: Discrete tokens make the reasoning process transparent, suitable for high-risk scenarios.

Section 07

Resource Links: ATLAS Open-Source Code and Paper Addresses

The ATLAS project code has been open-sourced: https://github.com/ZiyuGuo99/ATLAS Paper link: https://arxiv.org/abs/2605.15198

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15