Reading

Conan: A Hybrid Self-Improvement Training Framework for Human-Machine Collaborative Reasoning Models

Conan is a prototype project for reasoning model training that prioritizes automatic closed-loop operations with human decision-making at key nodes as a supplement. It achieves model self-improvement through hybrid training strategies and incorporates human decisions at critical points to enhance training quality.

Conan推理模型混合训练人机协同自动训练强化学习SFTDPO模型自改进训练框架

Published 2026-04-02 14:36Recent activity 2026-04-02 14:55Estimated read 7 min

Conan: A Hybrid Self-Improvement Training Framework for Human-Machine Collaborative Reasoning Models

Section 01

Conan: Guide to the Hybrid Self-Improvement Training Framework for Human-Machine Collaborative Reasoning Models

Conan is a prototype project for reasoning model training that prioritizes automatic closed-loop operations with human decision-making at key nodes as a supplement, and it is currently in the MVP phase. Its core goal is to build a system with clear control flow and module boundaries, achieve model self-improvement through hybrid training strategies, and strike a balance between automation efficiency and human-driven quality. The project supports experiment tracking and reproducibility, and will gradually integrate real components and expand functions in the future.

Section 02

Background and Core Concepts of the Conan Project

Large Reasoning Models (LRMs) face challenges in training: fully automated processes lack human intuition guidance, while complete reliance on humans is difficult to scale. Conan's core concept is 'automation first, human assistance second': links like data generation and automatic evaluation run in an automated closed loop; human expert decisions are introduced at key nodes such as reward calibration and failure mode diagnosis to verify whether the hybrid strategy outperforms the pure automatic baseline at minimal cost.

Section 03

System Architecture and Core Components of Conan

Conan adopts a modular design, with core components including:

Training Engine: Coordinates various modules and supports single-round/batch execution;
Task Generator: A placeholder module in the MVP phase; real task generation logic will be integrated later;
Auto Evaluator: Evaluates the correctness of model outputs and the rationality of reasoning;
Training Pipeline: Supports switching between training strategies like SFT, RL, and DPO;
Decision Routing System: Provides three diversion strategies: approve (auto-pass), review (human review), and block (block/pause).

Section 04

Human Review Mechanism and Intelligent Trigger Strategy

Conan's human review mechanism includes:

Review Queue: Automatically collects review/block samples, and experts fill back conclusions after review;
Metric Analysis: Counts the proportion of approve/review/block to understand model performance trends and the distribution of human intervention;
Intelligent Trigger: Automatically recommends human intervention nodes (such as continuous failures, reward drift) based on metrics;
Strategy Switching Recommendations: Recommends switching strategies like SFT (correction), RL (fine optimization), and DPO (preference alignment) based on metric changes.

Section 05

Technical Implementation Details of Conan

Technical details of Conan:

Development Environment: Python3.10+, pytest testing framework, managed via pyproject.toml;
Code Structure: src/hybrid_trainer includes modules like engine.py (training engine) and evaluation.py (evaluation);
MVP Status: Currently focuses on control flow correctness and module boundaries; task generator, evaluator, etc., are placeholder implementations;
Experiment Tracking: Records cycle information, evaluation metrics, human intervention, etc., and exports in JSONL format to ensure reproducibility.

Section 06

Future Development Plan of the Conan Project

Development plan of Conan:

Short-term Goals: Integrate real components, configure reward strategies, and integrate training executors;
Mid-term Goals: Develop a graphical human decision-making interface, support custom trigger rules, and expand multi-model support;
Long-term Vision: Become an infrastructure in the field of reasoning model training and provide a complete human-machine collaborative training toolchain.

Section 07

Industry Insights and Summary of Conan

Industry insights from Conan:

Human-machine collaboration is an inevitable path: Under current technology, the intervention of human experts at key decision points can improve training quality;
Observability is crucial: Metric aggregation and experiment tracking help understand training status and support correct decisions;
Modular design promotes iteration: Independent components are easy to replace and evolve quickly.

Summary: Conan is an innovative exploration in the field of reasoning model training. It realizes human-machine collaboration through a systematic framework. Although it is in the MVP phase, it has significant potential and is expected to push the boundaries of model capabilities.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15