# Imperial College Machine Learning Capstone Project: Practical Strategies for Bayesian Optimization in Black-Box Optimization

> This capstone project from Imperial College's certified Machine Learning and Artificial Intelligence program demonstrates how to apply Bayesian optimization, Gaussian process surrogate models, and combination strategies to find global optimal solutions within a limited number of evaluations for black-box optimization problems.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-21T01:14:23.000Z
- 最近活动: 2026-05-21T01:18:56.104Z
- 热度: 150.9
- 关键词: 黑盒优化, 贝叶斯优化, 高斯过程, 机器学习, 帝国理工, 采集函数, surrogate模型, 实验设计
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-hchagani-aicapstoneproject-ic
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-hchagani-aicapstoneproject-ic
- Markdown 来源: floors_fallback

---

## [Introduction] Core Summary of Imperial College's Bayesian Optimization Practical Project

This project is from the capstone program of Imperial College's certified Machine Learning and Artificial Intelligence course, focusing on black-box optimization problems. It uses Bayesian optimization, Gaussian process (GP) surrogate models, and innovative combination strategies to find global optimal solutions within a limited number of evaluations. The core breakthrough lies in the dual-model architecture of classification and regression GPs, combined with a circular candidate point generation strategy, which effectively balances exploration and exploitation. It has achieved significant results in scenarios such as pollution detection, drug discovery, and model tuning, providing important insights for black-box optimization practice.

## Project Background and Challenges

In machine learning engineering practice, black-box optimization (BBO) scenarios are widespread (e.g., hyperparameter tuning, material design, drug screening). Their characteristics include unknown objective functions, high evaluation costs, and limited samples. The Imperial College capstone project requires students to handle 8 independent black-box optimization problems, find the global maximum within a limited number of function evaluations, simulating real-world expensive experiment scenarios (each evaluation takes 48 hours, and the number of queries is limited).

## Core Methodology: Iteration from Exploration to Combination Strategy

### Iteration Phases
1. **Initial Exploration**: Used long-distance sampling to perceive the function profile, but no obvious structure was found.
2. **Classic Bayesian Optimization**: Used Gaussian process (GP) as the surrogate model (RBF/Matern kernel), and UCB acquisition function to balance exploration and exploitation, but it was too biased towards boundary regions. Linear regression performed poorly, confirming the nonlinear nature.
3. **Combination Strategy Breakthrough**: Built a dual-model architecture—classification GP predicts the probability of positive values (to distinguish signal/noise), regression GP is trained on positive samples to predict the logarithm of output; the acquisition function is the product of the two, ensuring recommended points are in the signal area and have high potential. The circular candidate point strategy (generating candidates with the current optimal as the center and the midpoint of neighbors as the radius) enhances local exploitation.

## Practical Results: Optimization Breakthroughs in Multiple Scenarios

1. **Pollution Detection Function**: Found points with output values several orders of magnitude higher than the initial optimal, and identified two high-potential regions.
2. **Drug Discovery Scenario**: Located and explored promising regions adjacent to high-value points (input is compound ratio, output is negative value of side effects).
3. **Model Parameter Tuning**: Found that two promising regions in the interval 0.6<x0<0.8 merged into a band; the function was insensitive to the x1 parameter, and decision tree partitioning revealed complex regional features.

## Technical Implementation and Engineering Practice Key Points

The project is based on the Python ecosystem and developed using Jupyter Lab; the Gaussian process model relies on mature libraries, supporting switching between kernel functions and acquisition strategies; the code includes data manuals and model cards, reflecting good engineering practices. Input features are normalized to the [0.0,1.0) interval, output remains in the original scale; query submission supports six decimal places of precision, adapting to high-dimensional optimization needs.

## Methodological Insights from the Project

1. **Combination Strategy is Better**: Separating classification and regression tasks, the combination effect is better than a single GP model.
2. **Integrate Domain Knowledge**: Even if the physical meaning of features is unknown, effective priors can be designed through data distribution (e.g., negative values as noise).
3. **Acquisition Strategy Adapted to Local Conditions**: UCB theory is elegant but needs flexible adjustment; dynamically combine multiple strategies.
4. **Art of Decision Under Constraints**: Under limited budget and delayed feedback, balancing exploration of the unknown and deep excavation of promising regions is a core capability.

## Project Summary and Value

This project accurately simulates real industrial scenarios, demonstrates the adaptability of the Bayesian optimization framework, and breaks through complex optimization problems through innovative model combinations and strategy designs. It has direct reference value for engineers in fields such as hyperparameter tuning, experimental design, and automated machine learning.
