Zing Forum

Reading

Mercer: A Six-Stage Intelligent Text-to-SQL System for Production-Grade Messy Databases

Introducing Mercer—a Text-to-SQL system designed specifically for complex databases in production environments, using a six-stage agent-based pipeline, supporting local GPU inference, and requiring no vector database.

Text-to-SQLAgentic WorkflowLLMDatabaseNatural Language ProcessingLocal InferenceGitHub
Published 2026-05-07 21:14Recent activity 2026-05-07 21:19Estimated read 9 min
Mercer: A Six-Stage Intelligent Text-to-SQL System for Production-Grade Messy Databases
1

Section 01

【Introduction】Mercer: A Six-Stage Intelligent System Solving Text-to-SQL Challenges for Production-Grade Messy Databases

Mercer is a Text-to-SQL system designed specifically for complex databases in production environments. Its core uses a six-stage agent-based pipeline, supports local GPU inference, and requires no vector database. It addresses the common "messy patterns" in production databases (obscure table names, ambiguous fields, complex relationships, missing documentation) and solves the pain point of traditional solutions relying on vector databases for inaccurate retrieval, providing enterprises with the ability to query databases using natural language in a secure and controllable manner.

2

Section 02

Background: Barriers to Implementing Text-to-SQL for Production Databases

Converting natural language to SQL (Text-to-SQL) is a popular direction for LLM applications, but most open-source solutions perform poorly in production environments. The reason is that production databases, after years of iteration, have "messy patterns" such as obscure table names, ambiguous field meanings, complex foreign key relationships, and lagging or missing documentation. Traditional methods rely on vector databases for schema retrieval; when facing thousands of tables and tens of thousands of fields, semantic similarity searches easily return irrelevant results, leading to generated SQL deviating from user intent.

3

Section 03

Core Design Philosophy of Mercer

Mercer proposes three core design philosophies to address the pain points of production databases:

  1. Agent-based Phased Reasoning: Decompose Text-to-SQL into six stages, each handled by a dedicated agent, gradually narrowing the search space to improve accuracy;
  2. Local-First Inference Architecture: Supports local GPU operation, no need for external APIs or cloud services, reducing costs and ensuring sensitive data does not leave the local environment;
  3. Zero Vector Database Dependency: Abandon vector databases, adopt a hybrid schema retrieval strategy combining rules and semantic understanding, which is more interpretable and controllable.
4

Section 04

Detailed Explanation of the Six-Stage Agent-Based Pipeline

Mercer's six-stage pipeline is an architectural innovation, with each stage as follows:

  1. Intent Parsing and Entity Recognition: Understand the core intent of the user's query (e.g., get the top 10 product sales), identify key entities (time, metrics, business objects);
  2. Candidate Table Filtering: Calculate relevance scores via a lightweight table-level semantic matching algorithm (combining table names, column names, comments) to quickly narrow down the table scope;
  3. Column-Level Precise Positioning: Analyze columns in candidate tables, handle naming chaos (e.g., sales_amount/revenue), match relevant columns using a business term mapping dictionary;
  4. Relationship Path Construction: Build connection paths between tables, prioritize clear foreign key relationships, and generate multiple candidate paths for ambiguous associations;
  5. SQL Sketch Generation: Construct the basic skeleton of the query (SELECT/FROM/JOIN/WHERE, etc.) to facilitate early verification and correction;
  6. Final SQL Synthesis and Validation: Convert to complete executable SQL, handle dialect differences, and validate syntax, existence of tables/columns, and query cost.
5

Section 05

Technical Implementation Highlights

Mercer's technical highlights include:

  1. Local GPU Inference Optimization: Optimized for consumer-grade GPUs (e.g., RTX3090/4090), achieving near-real-time response through model quantization, batch inference, and state caching—suitable for industries with high data privacy requirements;
  2. Schema Management Without Vector Databases: Adopt hybrid retrieval combining inverted indexes and rule matching, avoiding vector index maintenance costs and reducing system complexity;
  3. Extensible Plugin Architecture: Modular interfaces support custom database dialects (PostgreSQL/MySQL, etc.) or enterprise-specific needs, with each stage independently extensible and replaceable.
6

Section 06

Application Scenarios and Value

Typical application scenarios and value of Mercer:

  • Business Analyst Data Exploration: No need to memorize table structures or SQL syntax; quickly gain data insights using natural language;
  • Customer Service System Backend Queries: Customer service staff can directly query information such as customer order status, improving efficiency;
  • Data Governance and Auditing: Auditors can easily check data quality and compliance via natural language.
7

Section 07

Limitations and Future Directions

Limitations of Mercer:

  1. The six-stage pipeline increases latency, which is not ideal for millisecond-level response scenarios;
  2. Performance is limited for extremely messy, undocumented legacy databases. Future Directions:
  • Introduce lightweight end-to-end models as alternative paths;
  • Enhance multi-turn dialogue support;
  • Develop automated schema documentation generation tools to assist knowledge base construction.
8

Section 08

Conclusion: The Significance of Mercer for Text-to-SQL Implementation

Mercer represents an important step forward for Text-to-SQL technology from academic research to production practice. Its six-stage agent-based architecture, local-first deployment model, and optimization for messy schemas provide references for engineering practice. For teams exploring secure and reliable access to enterprise data via LLMs, Mercer is worth in-depth research.