Zing Forum

Reading

AST-Analyzer: A Precise Code Context Extraction Engine for LLMs

A static analysis tool developed in Go, which precisely extracts symbol-related context information from TypeScript codebases through AST parsing and call graph analysis, replacing the traditional approach of "stuffing entire files into prompts".

静态分析AST代码上下文提取LLM工具链TypeScript调用图Tree-sitter代码理解
Published 2026-05-26 12:14Recent activity 2026-05-26 12:19Estimated read 6 min
AST-Analyzer: A Precise Code Context Extraction Engine for LLMs
1

Section 01

AST-Analyzer: Introduction to the Precise Code Context Extraction Engine for LLMs

AST-Analyzer is a static analysis tool developed in Go. It precisely extracts symbol-related context information from TypeScript codebases through AST parsing and call graph analysis, replacing the traditional approach of "stuffing entire files into prompts". This project is maintained by jairo-litman, with source code hosted on GitHub (link: https://github.com/jairo-litman/ast-analyzer), and was released on May 26, 2026.

2

Section 02

Project Background and Motivation

When providing code context to LLMs, developers face a dilemma: stuffing entire files into prompts leads to irrelevant content flooding the context window, while manual selection is time-consuming and prone to missing key dependencies. AST-Analyzer was initiated by students from the São Paulo State University (UNESP) in Brazil to address this pain point, providing a "surgical precision" code extraction solution—given a target symbol, it returns its definition body, the header of its containing class declaration, the signatures of callers and callees, referenced type declarations, and file import statements.

3

Section 03

Core Technical Architecture

Dual Graph-Driven Dependency Analysis

  • Call Graph: Traverses TypeScript/TSX projects via the Tree-sitter parser to identify call relationships between functions, methods, and classes, answering "who calls me" and "who do I call".
  • Type Reference Graph: Tracks dependency relationships of explicit and inferred types, revealing implicit type contracts.

Incremental Indexing Mechanism

Uses SQLite for persistent storage of parsing results, only re-parsing files with changed content hashes, adapting to daily use of large codebases.

4

Section 04

Features and Usage

Four-Step Workflow

  1. Index: Scan the project to build call graphs and type graphs
  2. List: View all symbols and their IDs
  3. Extract: Output context in specified format based on symbol ID
  4. Listen: Real-time synchronization of file changes in development mode

Output Formats

Supports three formats: JSON (structured), Redacted (multi-file source code view), and Markdown (directly usable for LLM prompts).

Slice Control Parameters

Precisely control the extraction scope via --caller-depth/--callee-depth, --caller-bodies-up-to/--callee-bodies-up-to, --type-depth, and --max-per-level.

5

Section 05

Practical Application Scenarios

  1. Code Review Assistance: Quickly obtain the complete context of a function (callers, callees, type definitions), which is more efficient than manual IDE navigation.
  2. LLM Code Generation: Extract just the right context to help the model understand code intent while avoiding irrelevant details.
  3. Legacy Code Analysis: Visualize dependency relationships via call graphs to quickly understand large, underdocumented codebases.
6

Section 06

Technical Implementation Highlights

  • Tree-sitter Parsing: Balances speed and fault tolerance; even if the code has syntax errors, it can still extract most valid information.
  • Complete Import Parsing: Supports complex scenarios such as tsconfig path aliases, default imports, namespace imports, and re-exports.
  • Class Inheritance Chain Handling: Automatically parses the inheritance chain of class methods, correctly identifying parent class definitions for this and super calls.
7

Section 07

Project Significance and Insights

AST-Analyzer represents a smarter approach to code context management, providing a precise code information transfer solution for the LLM toolchain. Its open-source implementation offers an extensible foundation for the community; future improvements could explore refining extraction strategies, supporting more languages, or integrating into IDE plugins.