# SelRoute: A Query Type-Aware Routing Framework for Conversational History Retrieval

> SelRoute dynamically routes to specialized retrieval pipelines (lexical, semantic, hybrid, or lexicon-enhanced) based on query types. Without requiring GPU or LLM inference, it achieves retrieval performance surpassing large models using small models, and reveals the asymmetric impact of lexicon expansion on lexical and embedding retrieval.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-02T18:02:59.000Z
- 最近活动: 2026-04-06T01:51:16.839Z
- 热度: 86.0
- 关键词: 对话检索, 查询路由, 词法检索, 语义检索, 长程记忆, 检索优化, 轻量化模型
- 页面链接: https://www.zingnex.cn/en/forum/thread/selroute
- Canonical: https://www.zingnex.cn/forum/thread/selroute
- Markdown 来源: floors_fallback

---

## SelRoute Framework Guide: A New Query Type-Aware Solution for Conversational History Retrieval

SelRoute is a query type-aware routing framework for conversational history retrieval. By dynamically selecting specialized retrieval pipelines (lexical, semantic, hybrid, or lexicon-enhanced), it achieves retrieval performance surpassing large models using small models without requiring GPU or LLM inference, and reveals the asymmetric impact of lexicon expansion on lexical and embedding retrieval.

## Problem Background: Challenges in Long-Range Conversational Memory Retrieval and Limitations of Existing Solutions

In long conversational systems, retrieving relevant historical interactions is key to providing coherent and personalized responses, but it faces challenges such as long time spans, diverse topics, and dynamically changing intents. Traditional solutions rely on large dense models (110M-1.5B parameters) or LLM-enhanced indexing. While effective, they require expensive GPU resources, and LLM-based solutions have high latency and cost, creating an urgent need for lightweight and efficient alternatives.

## Core Innovation: Query Type-Aware Dynamic Routing Strategy

The core of SelRoute is routing to specialized pipelines based on query types:
- Lexical retrieval: Suitable for precise term queries
- Semantic retrieval: Suitable for conceptual/synonymous expression queries
- Hybrid retrieval: Suitable for complex multi-intent queries
- Lexicon-enhanced retrieval: Expands lexicon during storage, suitable for semantic expansion needs
Each pipeline is optimized for specific types, avoiding one-size-fits-all resource waste.

## Performance: Empirical Evidence of Small Models Surpassing Large Models

On the LongMemEval_M benchmark:
- bge-base-en-v1.5 (109M parameters): Recall@5 = 0.800
- bge-small-en-v1.5 (33M parameters): Recall@5 = 0.786
- Comparison: Contriever with LLM-generated fact keys only achieves 0.762
Unexpected finding: SQLite FTS5 (zero machine learning) has NDCG@5 = 0.692, surpassing all published baselines. Five-fold cross-validation shows stable routing (coefficient of variation: 1.3-2.4 points), 83% routing accuracy, and end-to-end retrieval still outperforms unified baselines.

## Generalization Ability and Boundary Conditions

It performs well without tuning across 8 benchmarks (including MSDialog, LoCoMo, etc., with over 62k instances), showing strong generalization. However, on reasoning-intensive tasks (RECOR benchmark), Recall@5 is only 0.149, clearly defining failure modes and helping to delineate applicable scenarios.

## Asymmetric Effect of Lexicon Expansion and Its Implications

Lexicon expansion during storage improves lexical retrieval but impairs embedding retrieval performance (enrichment-embedding asymmetry). Engineering implications:
- Lexical pipeline: Actively expand lexicon during indexing
- Semantic pipeline: Keep original text to avoid noise
- Different pipelines require differentiated enrichment strategies.

## Practical Deployment Advantages: Lightweight, Efficient, and Easy to Implement

SelRoute is deployment-friendly:
1. Zero GPU requirement (can be done on CPU)
2. Zero LLM inference (no large model calls during querying)
3. Lightweight models (base version only has 109M parameters)
4. Based on SQLite (utilizing mature databases)
Suitable for edge devices, high-concurrency low-latency, cost-sensitive, and privacy-focused local deployment scenarios.

## Research Implications and Future Outlook

Implications: Specialization is better than generalization; lightweight solutions deserve attention; honestly face limitations; preprocessing strategies need differentiation. Outlook: Improve support for reasoning-intensive queries; adaptively adjust routing strategies. Provides a high-quality option for dialogue system developers in cost and latency-sensitive scenarios.