Zing Forum

Reading

LLM Distillery: An Open-Source Framework for Distilling Large Model Knowledge into Efficient Specialized Classifiers

This article introduces the LLM Distillery framework, demonstrating how to transfer the judgment capabilities of large models like Gemini Flash to lightweight local models (Qwen2.5-1.5B) via knowledge distillation, enabling a content filtering and multi-dimensional scoring system with 100x lower cost and 50x faster inference speed.

knowledge distillationLLMGeminiQwen模型蒸馏内容过滤多维度评分机器学习自然语言处理
Published 2026-04-02 18:32Recent activity 2026-04-02 18:50Estimated read 6 min
LLM Distillery: An Open-Source Framework for Distilling Large Model Knowledge into Efficient Specialized Classifiers
1

Section 01

[Introduction] Core Value and Application Scenarios of the LLM Distillery Framework

This article introduces the open-source LLM Distillery framework, which transfers the judgment capabilities of large models like Gemini Flash to lightweight local models (e.g., Qwen2.5-1.5B) using knowledge distillation technology, achieving 100x lower cost and 50x faster inference speed. The framework is suitable for scenarios such as content filtering, multi-dimensional scoring, and hierarchical classification, providing an efficient solution for large model applications in production environments.

2

Section 02

Background: Pain Points and Solutions for Large Model Deployment

Large Language Models (LLMs) perform excellently in complex judgment tasks, but face high costs and slow inference speeds when deployed in production. LLM Distillery transfers the expertise of large models to small specialized models via knowledge distillation, significantly reducing operational costs and latency while maintaining judgment quality.

3

Section 03

Framework Workflow and Architecture Design

The core workflow of LLM Distillery includes: 1. Using Gemini Flash as an "Oracle" to generate training datasets with dimensional scores; 2. Multi-dimensional regression fine-tuning based on Qwen2.5-7B-Instruct; 3. Comprehensive data validation to ensure quality; 4. Local deployment for fast batch inference. Architecture unification was completed in November 2025: the Oracle only outputs dimensional scores (0-10 points) and reasoning processes, while hierarchical classification is handled by a postfilter, allowing flexible adjustment of classification thresholds without re-labeling data.

4

Section 04

Deployed Production-Grade Filter Examples

As of November 2025, the project has deployed several filters:

  • Sustainability Technology Filter (sustainability_technology v1):Evaluates 6 dimensions based on the LCSA framework, using Qwen2.5-1.5B + LoRA fine-tuning (18.5 million parameters) with a test MAE of 0.690;
  • uplifting v5:Evaluates 6 positive impact dimensions, also based on Qwen2.5-1.5B + LoRA, with a validation MAE of 0.681, and an evidence gatekeeper mechanism that limits the maximum score of speculative content to 3.0;
  • Investment Risk Filter (investment-risk v4):Covers 8 dimensions, with 4,880 validation data entries prepared, and the philosophy: "Cannot predict crashes, but can be prepared".
5

Section 05

Training Data Preparation Workflow

The project provides a complete data toolchain:

  1. prepare_data.py: Supports stratified sampling, splitting data into training set (80%), validation set (10%), and test set (10%);
  2. validate_training_data.py: Checks structural integrity, data distribution, label quality, etc.;
  3. deduplicate_training_data.py: Removes cross-split duplicate data;
  4. Automatically generates validation reports and saves them to the filter directory.
6

Section 06

Model Training and Deployment Details

During training, Qwen2.5-7B-Instruct is used as the base model, requiring a GPU with 16GB+ VRAM (e.g., RTX4090/A100), and training takes approximately 2-4 hours. After training, it can be deployed to a local environment for high-speed batch inference. Additionally, the project provides development tools (such as filter development guide agents, coordination agents) and a main dataset containing 402,000 articles (October-November 2025).

7

Section 07

Future Development Directions

The project's next steps include: training the remaining investment risk filter (investment-risk v4), and building a batch processing pipeline for production deployment to support high-volume scoring needs. With the development of more filters, LLM Distillery is expected to become an important open-source tool in the field of content evaluation and classification.