Zing 论坛

正文

OBELISK:融合贝叶斯优化与大语言模型推理的智能查询优化系统

OBELISK 是一个将贝叶斯优化与大语言模型推理相结合的数据库查询优化框架,通过离线规划阶段利用历史观测数据和LLM的推理能力,生成高质量的查询执行计划配置,从而显著提升复杂SQL查询的性能。

贝叶斯优化大语言模型查询优化数据库调优LLM推理Bayesian OptimizationQuery PlanningTiDB配置优化
发布时间 2026/05/30 15:08最近活动 2026/05/30 15:19预计阅读 7 分钟
OBELISK:融合贝叶斯优化与大语言模型推理的智能查询优化系统
1

章节 01

OBELISK: A Smart Query Optimization System Combining Bayesian Optimization and LLM Reasoning

OBELISK is an offline query planning framework for databases that integrates Bayesian optimization and large language model (LLM) reasoning. Its core goal is to generate high-quality query execution plan configurations using historical data and LLM's reasoning ability, thereby significantly improving the performance of complex SQL queries. Key components include Guider (Bayesian optimization engine) and ConfigurationReasoner (LLM-based reasoning module). The project is open-source under MIT License, maintained by DaSECandyLab, and available on GitHub.

2

章节 02

Background & Motivation of OBELISK

Traditional database query optimizers rely on simplified statistics and heuristic rules, which struggle with complex query scenarios as data scales and query patterns become more intricate. OBELISK was developed to address this challenge by combining Bayesian optimization (for efficient configuration search) and LLM reasoning (for intelligent configuration generation), aiming to automatically find optimal database parameters and execution plan strategies.

3

章节 03

System Architecture & Core Mechanisms

OBELISK's architecture consists of two main components:

  1. Guider: The core optimization engine with two strategies—Vanilla GP (standard Gaussian process) and TCBO (Trust-Constrained Bayesian Optimization for timeout constraints). It uses Sobol sequence sampling and LHS as fallback.
  2. ConfigurationReasoner: Uses LLM's context learning to generate configurations via context building (similar historical examples), prompt engineering, configuration generation, and rejection handling (criticism-synthesis loop for prompt optimization).

The system's core mechanism involves a feedback loop: Guider collects observation data, ConfigurationReasoner uses LLM to generate configurations based on historical context and Bayesian suggestions, then results are fed back to update the model. It also manages two types of configuration knobs (logical C-knobs for join order, physical C-knobs for operator costs) normalized to [0,1].

4

章节 04

Technical Implementation Details

OBELISK's technical details include:

  • Gaussian Process & Trust Region: TCBO divides the configuration space into sub-regions with independent GP models to capture local structures and handle multi-modal landscapes.
  • Batch Processing & Diversity: Supports batch configuration generation with de-duplication, mixing LLM and Bayesian configurations, and prioritizing LLM results.
  • Fault Tolerance: Falls back to pure Bayesian sampling if LLM is unavailable, uses LHS for initial configurations when data is insufficient, and ensures valid configurations via range checks and default values.
5

章节 05

Practical Application Scenarios

OBELISK is applicable in:

  1. Complex Query Tuning: Reduces latency for long-running analytical queries more efficiently than manual tuning.
  2. Workload Feature Learning: Accumulates historical data to provide fast, accurate suggestions for similar queries.
  3. Database Upgrade/Migration: Helps recalibrate configurations for new environments.
  4. Cloud Native Elastic Optimization: Pre-generates optimized configurations for different cloud instance specs to support elastic scaling.
6

章节 06

Technical Highlights & Innovations

OBELISK's key innovations:

  1. Cross-Paradigm Fusion: Combines Bayesian optimization's mathematical rigor with LLM's semantic reasoning.
  2. Systematic Prompt Engineering: Integrates prompt optimization as a core component to learn from errors.
  3. Modular Design: Clear component division for easy extension and maintenance.
  4. Production Readiness: Includes complete error handling, logging, and configuration management for industrial use.
7

章节 07

Usage, Deployment & Future Outlook

Usage Steps: Clone the GitHub repo → Install dependencies → Configure LLM API key and DB connection → Prepare SQL workload → Run the main script.

Future Outlook: OBELISK represents an important direction in database query optimization. As LLM capabilities and Bayesian optimization algorithms evolve, it is expected to revolutionize database performance tuning across more scenarios, proving the potential of AI-assisted system optimization.