Zing Forum

Reading

ATLAS: A Multi-Agent LLM Framework for Accurate and Interpretable Single-Cell Annotation

A multi-agent LLM system reproduced based on the CASSIA paper, which enables automated annotation of single-cell RNA sequencing data through 7 collaborative agents, demonstrating the strong potential of AI agent collaboration in biomedical research.

ATLAS多智能体单细胞注释CASSIAscRNA-seq生物信息学LLM智能体协作细胞类型标注可解释AI
Published 2026-05-22 19:36Recent activity 2026-05-22 19:55Estimated read 6 min
ATLAS: A Multi-Agent LLM Framework for Accurate and Interpretable Single-Cell Annotation
1

Section 01

ATLAS Framework Introduction: Multi-Agent LLM Empowers Accurate and Interpretable Single-Cell Annotation

ATLAS is an open-source multi-agent LLM framework reproduced based on the CASSIA paper, designed specifically for automated cell type annotation of single-cell RNA sequencing (scRNA-seq) data. Through the division of labor and collaboration among 7 cooperative agents, it achieves high-quality, interpretable annotation results with controllable costs, demonstrating the strong potential of AI agent collaboration in biomedical research.

2

Section 02

Background: Challenges of Single-Cell Annotation and Opportunities for LLMs

Single-cell RNA sequencing technology enables single-cell resolution analysis of gene expression, but the massive amount of data also poses challenges for cell cluster annotation: traditional manual annotation is time-consuming and subjective. Large Language Models (LLMs) have the ability to understand biological literature and gene functions, providing new possibilities for automated annotation. ATLAS is exactly a multi-agent LLM framework addressing this issue.

3

Section 03

Core Methodology: Seven-Agent Collaborative Pipeline Architecture

ATLAS adopts a "divide and conquer" strategy, decomposing the annotation task into sub-tasks handled by 7 agents:

  • Core Agents: Annotator (infers cell types), Validator (verifies consistency and iteratively corrects), Formatter (converts to JSON format), Scoring (quality scoring), Reporter (generates HTML reports);
  • Optional Enhancement Modules: Annotation Boost (corrects low-confidence annotations, e.g., identifying gold standard errors), RAG (retrieves authoritative databases to enhance annotations).
4

Section 04

Technical Details: Cost Optimization and Knowledge Base Integration

Cost Optimization: Controls costs through intelligent model selection—for example, low-cost models (DeepSeek v3, Gemini Flash) are used for scoring/formatting, while strong models (Claude Sonnet4.5) are used for annotation/validation. The default pipeline costs approximately $0.04 per run; Knowledge Base Integration: Built-in with authoritative data from CellMarker2.0 and Cell Ontology, and unifies access to multiple LLMs (GPT-5, Claude, etc.) via OpenRouter, requiring only one API key.

5

Section 05

Validation Evidence: Test Cases and Performance

Results from ATLAS's test suite demonstrate its reliability:

  • Clear CD8+ T cell case: Correct (score 92);
  • Plasma cell case: Identified through noise (score 78);
  • Key breakthrough: Identified gold standard errors (corrected incorrectly labeled monocytes to enteric glial cells); These results prove that multi-agent collaboration can discover issues missed by human experts.
6

Section 06

Application Value and Relationship to CASSIA Reproduction

Application Value: ATLAS provides interpretable reasoning chains, auditable report outputs, low-cost automated annotation, and workflows that integrate scattered knowledge; Relationship to CASSIA: Faithfully reproduces the CASSIA methodology, simplifies LLM provider support (only OpenRouter), has a more readable codebase, and is suitable as a learning example.

7

Section 07

Limitations and Future Improvement Directions

Limitations: Only focuses on cell type annotation and does not extend to processes like batch effect correction; relies on predefined knowledge bases, with insufficient coverage of new cell types; Future Directions: Integrate real-time literature retrieval, expand spatial transcriptome annotation, develop visualization interfaces, and establish a community feedback loop.

8

Section 08

Conclusion: A Vertical Domain Application Example of AI Agent Collaboration

ATLAS is a successful case of AI agent application in vertical domains, achieving performance beyond a single model through task decomposition and agent collaboration. This model is not only applicable to single-cell biology but also provides a reference architecture for other scientific fields that require professional knowledge integration and complex reasoning.