# YORO Hybrid Architecture: A "Retrieve Once Only" Intelligent Routing Solution for Text-to-SQL

> An innovative Text-to-SQL generation architecture that intelligently routes queries to three reasoning paths (purely parameterized, hybrid compression, or full Graph-RAG) via a lightweight router, achieving an 80% reduction in prompt tokens.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-16T23:06:33.000Z
- 最近活动: 2026-06-16T23:20:58.971Z
- 热度: 159.8
- 关键词: Text-to-SQL, 大语言模型, RAG, 令牌优化, 智能路由, 数据库, 微调, 开源项目
- 页面链接: https://www.zingnex.cn/en/forum/thread/yoro-text-to-sql
- Canonical: https://www.zingnex.cn/forum/thread/yoro-text-to-sql
- Markdown 来源: floors_fallback

---

## YORO Hybrid Architecture: Introduction to the Intelligent Routing Solution for Text-to-SQL

The YORO Hybrid Architecture is an innovative solution addressing token cost issues in the Text-to-SQL domain. It intelligently routes queries to three reasoning paths (purely parameterized, hybrid compression, or full Graph-RAG) via a lightweight router, achieving an 80% reduction in prompt tokens.

Original Author/Maintainer: Dhritimannandi
Source Platform: GitHub
Original Link: https://github.com/Dhritimannandi/yoro-hybrid-architecture
Publication Date: June 16, 2026

## Project Background: The Token Cost Dilemma of Text-to-SQL

Traditional Text-to-SQL solutions often use the RAG pattern, inputting the complete database schema as context into large models. However, database schemas usually contain a large number of tables and fields, leading to prompt tokens consuming a lot of tokens, increasing API costs and occupying context space.

Core Insight of YORO (You Only Retrieve Once) Hybrid Architecture: Not all queries require complete schema information; models can internalize schema knowledge during training, enabling 'zero schema tokens' for some queries.

## Core Innovation: Three-Path Intelligent Routing and Router Implementation

### Three-Path Intelligent Routing
- **Path A (YORO Purely Parameterized)**：Suitable for standard aggregation queries; prompts only include database ID and question, with an average of 50 tokens (model has internalized the schema).
- **Path B (YORO Hybrid)**：Suitable for medium-complexity problems; extracts and compresses a subset of the schema; prompts include database ID, question, and compressed subset, with an average of 500-800 tokens.
- **Path C (Graph-RAG Fallback)**：Suitable for complex queries; prompts include the complete compressed schema, with an average of 3900 tokens (fallback solution).

### Router Implementation
No additional LLM calls are needed; it uses keyword complexity scoring (completed in 1 millisecond):
- Complexity-increasing signals: geographic joins, statistical analysis, data reconciliation, window functions, question length >120 characters.
- Complexity-decreasing signals: TOP-N queries, single aggregation, time filtering, common business vocabulary.
- Threshold rules: Score <0.55 → Path A; <0.8 → Path B; else → Path C.

## Detailed Explanation of Architecture Components

The project includes five core modules:
1. **Schema Analyzer**: Reads DKL Excel to generate three schema representations: CodeS, PICARD, and YORO prompts.
2. **Synthetic Data Generator**: Three-stage process (skeleton extraction → SQL generation → NLQ generation).
3. **Fine-tuning Formatter**: Supports OpenAI/Azure and HuggingFace/PEFT formats; controls the proportion of training data via hybrid_ratio.
4. **Hybrid Inference Router**: Implements complexity scoring and path selection.
5.** Pipeline Orchestrator**: Provides CLI interface (setup/benchmark/generate modes).

## Benchmark Testing: Empirical Results of 80% Token Reduction

In tests on the Olist Brazilian e-commerce dataset with 44 questions:
| Path | Number of Questions | Proportion | Average Tokens | Reduction vs. Baseline |
|------|---------------------|------------|----------------|------------------------|
| A - YORO Pure |26 |60% |50 |-98.7% |
| B - YORO Hybrid |11 |25% |~700 |-82% |
| C - Graph-RAG |7 |15% |~3900 |0% |
| **Weighted Hybrid** |**44** |**100%** |**~560** |**-85.6%** |

Overall, it achieves approximately 80% token reduction while maintaining SQL accuracy.

## Technical Insights: General Efficiency Optimization Ideas and Application Scenarios

General idea of YORO architecture: Adaptive resource allocation through problem complexity analysis, which can be extended to:
- Document Q&A: Use lightweight models for simple questions, large models for complex ones.
- Code generation: Use cached templates for common patterns, full generation for novel requirements.
- Multimodal processing: Choose different pipelines based on input features.

Key: Finding appropriate 'complexity proxy metrics' (heuristic rules) enables effective resource allocation.

## Limitations and Considerations

Limitations of YORO:
1. Fine-tuning of expert models requires domain-specific training data; migrating to a new database requires re-synthesizing data and re-fine-tuning.
2. The complexity scorer is based on heuristic rules; uncovered query types may lead to routing errors (Path C serves as a fallback but requires monitoring and optimization).
3. The current implementation is targeted at the Olist dataset; performance on complex enterprise-level databases needs further verification.

## Conclusion: Importance and Insights of Efficiency Optimization

The YORO Hybrid Architecture brings a new idea for efficiency optimization in Text-to-SQL: treating queries differently instead of uniformly. This 'on-demand allocation' philosophy is applicable to a wider range of AI system designs. In today's era where computing costs are a concern, such efficiency innovations will become increasingly important.
