Zing Forum

Reading

SelRoute: A Query Type-Aware Routing Framework for Conversational History Retrieval

SelRoute dynamically routes to specialized retrieval pipelines (lexical, semantic, hybrid, or lexicon-enhanced) based on query types. Without requiring GPU or LLM inference, it achieves retrieval performance surpassing large models using small models, and reveals the asymmetric impact of lexicon expansion on lexical and embedding retrieval.

对话检索查询路由词法检索语义检索长程记忆检索优化轻量化模型
Published 2026-04-03 02:02Recent activity 2026-04-06 09:51Estimated read 6 min
SelRoute: A Query Type-Aware Routing Framework for Conversational History Retrieval
1

Section 01

SelRoute Framework Guide: A New Query Type-Aware Solution for Conversational History Retrieval

SelRoute is a query type-aware routing framework for conversational history retrieval. By dynamically selecting specialized retrieval pipelines (lexical, semantic, hybrid, or lexicon-enhanced), it achieves retrieval performance surpassing large models using small models without requiring GPU or LLM inference, and reveals the asymmetric impact of lexicon expansion on lexical and embedding retrieval.

2

Section 02

Problem Background: Challenges in Long-Range Conversational Memory Retrieval and Limitations of Existing Solutions

In long conversational systems, retrieving relevant historical interactions is key to providing coherent and personalized responses, but it faces challenges such as long time spans, diverse topics, and dynamically changing intents. Traditional solutions rely on large dense models (110M-1.5B parameters) or LLM-enhanced indexing. While effective, they require expensive GPU resources, and LLM-based solutions have high latency and cost, creating an urgent need for lightweight and efficient alternatives.

3

Section 03

Core Innovation: Query Type-Aware Dynamic Routing Strategy

The core of SelRoute is routing to specialized pipelines based on query types:

  • Lexical retrieval: Suitable for precise term queries
  • Semantic retrieval: Suitable for conceptual/synonymous expression queries
  • Hybrid retrieval: Suitable for complex multi-intent queries
  • Lexicon-enhanced retrieval: Expands lexicon during storage, suitable for semantic expansion needs Each pipeline is optimized for specific types, avoiding one-size-fits-all resource waste.
4

Section 04

Performance: Empirical Evidence of Small Models Surpassing Large Models

On the LongMemEval_M benchmark:

  • bge-base-en-v1.5 (109M parameters): Recall@5 = 0.800
  • bge-small-en-v1.5 (33M parameters): Recall@5 = 0.786
  • Comparison: Contriever with LLM-generated fact keys only achieves 0.762 Unexpected finding: SQLite FTS5 (zero machine learning) has NDCG@5 = 0.692, surpassing all published baselines. Five-fold cross-validation shows stable routing (coefficient of variation: 1.3-2.4 points), 83% routing accuracy, and end-to-end retrieval still outperforms unified baselines.
5

Section 05

Generalization Ability and Boundary Conditions

It performs well without tuning across 8 benchmarks (including MSDialog, LoCoMo, etc., with over 62k instances), showing strong generalization. However, on reasoning-intensive tasks (RECOR benchmark), Recall@5 is only 0.149, clearly defining failure modes and helping to delineate applicable scenarios.

6

Section 06

Asymmetric Effect of Lexicon Expansion and Its Implications

Lexicon expansion during storage improves lexical retrieval but impairs embedding retrieval performance (enrichment-embedding asymmetry). Engineering implications:

  • Lexical pipeline: Actively expand lexicon during indexing
  • Semantic pipeline: Keep original text to avoid noise
  • Different pipelines require differentiated enrichment strategies.
7

Section 07

Practical Deployment Advantages: Lightweight, Efficient, and Easy to Implement

SelRoute is deployment-friendly:

  1. Zero GPU requirement (can be done on CPU)
  2. Zero LLM inference (no large model calls during querying)
  3. Lightweight models (base version only has 109M parameters)
  4. Based on SQLite (utilizing mature databases) Suitable for edge devices, high-concurrency low-latency, cost-sensitive, and privacy-focused local deployment scenarios.
8

Section 08

Research Implications and Future Outlook

Implications: Specialization is better than generalization; lightweight solutions deserve attention; honestly face limitations; preprocessing strategies need differentiation. Outlook: Improve support for reasoning-intensive queries; adaptively adjust routing strategies. Provides a high-quality option for dialogue system developers in cost and latency-sensitive scenarios.