Zing Forum

Reading

Second Brain: An LLM Experiment Platform for SFT, RLHF, and RAG

An LLM experiment environment designed specifically for AI engineers and researchers, supporting end-to-end experiments for Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Retrieval-Augmented Generation (RAG). It features parallel inference, blind test evaluation, and dataset generation capabilities.

LLMSFTRLHFRAGFastAPIpgvector模型评估数据集生成盲测领域驱动设计
Published 2026-04-21 00:43Recent activity 2026-04-21 00:50Estimated read 6 min
Second Brain: An LLM Experiment Platform for SFT, RLHF, and RAG
1

Section 01

[Overview] Second Brain: An LLM Experiment Platform for SFT, RLHF, and RAG

Second Brain is an open-source LLM experiment environment designed for AI engineers and researchers, integrating end-to-end experiments for Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Retrieval-Augmented Generation (RAG). It addresses efficiency issues in traditional fragmented workflows and provides core features such as parallel inference, blind test evaluation, and dataset generation to support systematic model experimentation and optimization.

2

Section 02

Background: Pain Points of Traditional LLM Experiments and Platform Design Philosophy

Traditional LLM experiments require switching between multiple tools (script API calls, spreadsheet evaluation recording, text editor data organization), leading to low efficiency and easy introduction of human errors. Second Brain follows the philosophy of 'from test console to scientific laboratory' and encapsulates the entire experiment workflow in a unified web application. The platform uses a Domain-Driven Design (DDD) architecture, with the backend based on FastAPI, the data layer using PostgreSQL+pgvector for vector search support, and the frontend implementing mathematical formula rendering.

3

Section 03

Core Features: Parallel Inference, Blind Test Evaluation, and Data Closed-Loop

  1. Deterministic Parallel Inference: Simultaneously send requests to two models/prompts, ensuring consistent parameters and RAG context, eliminating time noise, and enhancing the scientific rigor of A/B testing;
  2. Blind Test Evaluation Mechanism: Hide the real identity of models, only display 'Model A/B', avoid brand bias, and make evaluations more objective;
  3. Semantic-Level Text Comparison: Highlight output differences based on jsdiff, making it easy to identify details like hallucinations and omissions;
  4. Gold Standard Dataset Export: Automatically export evaluation results in JSONL format, compatible with mainstream training frameworks, and shorten the experiment-training closed-loop time;
  5. Advanced RAG Pipeline: Support metadata pre-filtering (document chapters, dates, etc.) + vector re-ranking, with deterministic sorting ensuring experiment reproducibility.
4

Section 04

Technical Architecture: DDD Layered Design and Scalability

The platform uses a DDD layered architecture, with code divided into 5 layers:

  • api/: FastAPI routing layer for handling HTTP requests;
  • core/: Configuration and environment variable management;
  • repositories/: Database interaction layer, encapsulating pgvector semantic search;
  • schemas/: Pydantic models responsible for data validation and serialization;
  • services/: Core business logic (LLM orchestrator, RAG Pipeline). The LLM orchestration layer uses an abstract interface, supporting Ollama local models by default, and reserves interfaces for extending other providers, facilitating maintenance and expansion.
5

Section 05

Applicable Scenarios: Covering Full-Cycle Needs of Model Development

Second Brain is suitable for the following scenarios:

  • Data preparation before model fine-tuning: Collect human preferences through blind tests and generate RLHF comparison data pairs;
  • Prompt engineering optimization: Parallel comparison of different prompt effects for data-driven decision-making;
  • RAG system tuning: Test the impact of different retrieval strategies and re-ranking algorithms;
  • Model capability benchmarking: Establish an internal evaluation system to track iteration progress.
6

Section 06

Conclusion: The Value of an Engineering Experiment Platform

Second Brain elevates LLM experiments from ad-hoc scripts to an engineering platform level. It is not just a collection of tools but a complete methodology. Every link from experiment design to data output is carefully polished, helping teams systematically improve model performance. It is an open-source project worth exploring.