Zing Forum

Reading

YORO Hybrid Architecture: A "Retrieve Once Only" Intelligent Routing Solution for Text-to-SQL

An innovative Text-to-SQL generation architecture that intelligently routes queries to three reasoning paths (purely parameterized, hybrid compression, or full Graph-RAG) via a lightweight router, achieving an 80% reduction in prompt tokens.

Text-to-SQL大语言模型RAG令牌优化智能路由数据库微调开源项目
Published 2026-06-17 07:06Recent activity 2026-06-17 07:20Estimated read 8 min
YORO Hybrid Architecture: A "Retrieve Once Only" Intelligent Routing Solution for Text-to-SQL
1

Section 01

YORO Hybrid Architecture: Introduction to the Intelligent Routing Solution for Text-to-SQL

The YORO Hybrid Architecture is an innovative solution addressing token cost issues in the Text-to-SQL domain. It intelligently routes queries to three reasoning paths (purely parameterized, hybrid compression, or full Graph-RAG) via a lightweight router, achieving an 80% reduction in prompt tokens.

Original Author/Maintainer: Dhritimannandi Source Platform: GitHub Original Link: https://github.com/Dhritimannandi/yoro-hybrid-architecture Publication Date: June 16, 2026

2

Section 02

Project Background: The Token Cost Dilemma of Text-to-SQL

Traditional Text-to-SQL solutions often use the RAG pattern, inputting the complete database schema as context into large models. However, database schemas usually contain a large number of tables and fields, leading to prompt tokens consuming a lot of tokens, increasing API costs and occupying context space.

Core Insight of YORO (You Only Retrieve Once) Hybrid Architecture: Not all queries require complete schema information; models can internalize schema knowledge during training, enabling 'zero schema tokens' for some queries.

3

Section 03

Core Innovation: Three-Path Intelligent Routing and Router Implementation

Three-Path Intelligent Routing

  • Path A (YORO Purely Parameterized):Suitable for standard aggregation queries; prompts only include database ID and question, with an average of 50 tokens (model has internalized the schema).
  • Path B (YORO Hybrid):Suitable for medium-complexity problems; extracts and compresses a subset of the schema; prompts include database ID, question, and compressed subset, with an average of 500-800 tokens.
  • Path C (Graph-RAG Fallback):Suitable for complex queries; prompts include the complete compressed schema, with an average of 3900 tokens (fallback solution).

Router Implementation

No additional LLM calls are needed; it uses keyword complexity scoring (completed in 1 millisecond):

  • Complexity-increasing signals: geographic joins, statistical analysis, data reconciliation, window functions, question length >120 characters.
  • Complexity-decreasing signals: TOP-N queries, single aggregation, time filtering, common business vocabulary.
  • Threshold rules: Score <0.55 → Path A; <0.8 → Path B; else → Path C.
4

Section 04

Detailed Explanation of Architecture Components

The project includes five core modules:

  1. Schema Analyzer: Reads DKL Excel to generate three schema representations: CodeS, PICARD, and YORO prompts.
  2. Synthetic Data Generator: Three-stage process (skeleton extraction → SQL generation → NLQ generation).
  3. Fine-tuning Formatter: Supports OpenAI/Azure and HuggingFace/PEFT formats; controls the proportion of training data via hybrid_ratio.
  4. Hybrid Inference Router: Implements complexity scoring and path selection. 5.** Pipeline Orchestrator**: Provides CLI interface (setup/benchmark/generate modes).
5

Section 05

Benchmark Testing: Empirical Results of 80% Token Reduction

In tests on the Olist Brazilian e-commerce dataset with 44 questions:

Path Number of Questions Proportion Average Tokens Reduction vs. Baseline
A - YORO Pure 26 60% 50 -98.7%
B - YORO Hybrid 11 25% ~700 -82%
C - Graph-RAG 7 15% ~3900 0%
Weighted Hybrid 44 100% ~560 -85.6%

Overall, it achieves approximately 80% token reduction while maintaining SQL accuracy.

6

Section 06

Technical Insights: General Efficiency Optimization Ideas and Application Scenarios

General idea of YORO architecture: Adaptive resource allocation through problem complexity analysis, which can be extended to:

  • Document Q&A: Use lightweight models for simple questions, large models for complex ones.
  • Code generation: Use cached templates for common patterns, full generation for novel requirements.
  • Multimodal processing: Choose different pipelines based on input features.

Key: Finding appropriate 'complexity proxy metrics' (heuristic rules) enables effective resource allocation.

7

Section 07

Limitations and Considerations

Limitations of YORO:

  1. Fine-tuning of expert models requires domain-specific training data; migrating to a new database requires re-synthesizing data and re-fine-tuning.
  2. The complexity scorer is based on heuristic rules; uncovered query types may lead to routing errors (Path C serves as a fallback but requires monitoring and optimization).
  3. The current implementation is targeted at the Olist dataset; performance on complex enterprise-level databases needs further verification.
8

Section 08

Conclusion: Importance and Insights of Efficiency Optimization

The YORO Hybrid Architecture brings a new idea for efficiency optimization in Text-to-SQL: treating queries differently instead of uniformly. This 'on-demand allocation' philosophy is applicable to a wider range of AI system designs. In today's era where computing costs are a concern, such efficiency innovations will become increasingly important.