# Conan: A Hybrid Self-Improvement Training Framework for Human-Machine Collaborative Reasoning Models

> Conan is a prototype project for reasoning model training that prioritizes automatic closed-loop operations with human decision-making at key nodes as a supplement. It achieves model self-improvement through hybrid training strategies and incorporates human decisions at critical points to enhance training quality.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-02T06:36:38.000Z
- 最近活动: 2026-04-02T06:55:03.297Z
- 热度: 154.7
- 关键词: Conan, 推理模型, 混合训练, 人机协同, 自动训练, 强化学习, SFT, DPO, 模型自改进, 训练框架
- 页面链接: https://www.zingnex.cn/en/forum/thread/conan
- Canonical: https://www.zingnex.cn/forum/thread/conan
- Markdown 来源: floors_fallback

---

## Conan: Guide to the Hybrid Self-Improvement Training Framework for Human-Machine Collaborative Reasoning Models

Conan is a prototype project for reasoning model training that prioritizes automatic closed-loop operations with human decision-making at key nodes as a supplement, and it is currently in the MVP phase. Its core goal is to build a system with clear control flow and module boundaries, achieve model self-improvement through hybrid training strategies, and strike a balance between automation efficiency and human-driven quality. The project supports experiment tracking and reproducibility, and will gradually integrate real components and expand functions in the future.

## Background and Core Concepts of the Conan Project

Large Reasoning Models (LRMs) face challenges in training: fully automated processes lack human intuition guidance, while complete reliance on humans is difficult to scale. Conan's core concept is 'automation first, human assistance second': links like data generation and automatic evaluation run in an automated closed loop; human expert decisions are introduced at key nodes such as reward calibration and failure mode diagnosis to verify whether the hybrid strategy outperforms the pure automatic baseline at minimal cost.

## System Architecture and Core Components of Conan

Conan adopts a modular design, with core components including:
1. **Training Engine**: Coordinates various modules and supports single-round/batch execution;
2. **Task Generator**: A placeholder module in the MVP phase; real task generation logic will be integrated later;
3. **Auto Evaluator**: Evaluates the correctness of model outputs and the rationality of reasoning;
4. **Training Pipeline**: Supports switching between training strategies like SFT, RL, and DPO;
5. **Decision Routing System**: Provides three diversion strategies: approve (auto-pass), review (human review), and block (block/pause).

## Human Review Mechanism and Intelligent Trigger Strategy

Conan's human review mechanism includes:
- **Review Queue**: Automatically collects review/block samples, and experts fill back conclusions after review;
- **Metric Analysis**: Counts the proportion of approve/review/block to understand model performance trends and the distribution of human intervention;
- **Intelligent Trigger**: Automatically recommends human intervention nodes (such as continuous failures, reward drift) based on metrics;
- **Strategy Switching Recommendations**: Recommends switching strategies like SFT (correction), RL (fine optimization), and DPO (preference alignment) based on metric changes.

## Technical Implementation Details of Conan

Technical details of Conan:
- **Development Environment**: Python3.10+, pytest testing framework, managed via pyproject.toml;
- **Code Structure**: src/hybrid_trainer includes modules like engine.py (training engine) and evaluation.py (evaluation);
- **MVP Status**: Currently focuses on control flow correctness and module boundaries; task generator, evaluator, etc., are placeholder implementations;
- **Experiment Tracking**: Records cycle information, evaluation metrics, human intervention, etc., and exports in JSONL format to ensure reproducibility.

## Future Development Plan of the Conan Project

Development plan of Conan:
- **Short-term Goals**: Integrate real components, configure reward strategies, and integrate training executors;
- **Mid-term Goals**: Develop a graphical human decision-making interface, support custom trigger rules, and expand multi-model support;
- **Long-term Vision**: Become an infrastructure in the field of reasoning model training and provide a complete human-machine collaborative training toolchain.

## Industry Insights and Summary of Conan

Industry insights from Conan:
1. **Human-machine collaboration is an inevitable path**: Under current technology, the intervention of human experts at key decision points can improve training quality;
2. **Observability is crucial**: Metric aggregation and experiment tracking help understand training status and support correct decisions;
3. **Modular design promotes iteration**: Independent components are easy to replace and evolve quickly.

Summary: Conan is an innovative exploration in the field of reasoning model training. It realizes human-machine collaboration through a systematic framework. Although it is in the MVP phase, it has significant potential and is expected to push the boundaries of model capabilities.
