Zing Forum

Reading

BeautyBrain: Distilling Gemini's Reasoning Capabilities into 4B Open-Source Models to Build an Intelligent Beauty Brand Extraction System

An efficient beauty brand extraction system that transfers Gemini's reasoning capabilities to Qwen 2.5 4B/7B small models via knowledge distillation, achieving faster inference speed and higher accuracy than the original models, and supporting automatic identification of beauty brands and their categories from social media content.

知识蒸馏QwenGemini美妆品牌提取AWQ量化LoRA微调NER多任务学习社交媒体分析
Published 2026-04-11 21:33Recent activity 2026-04-11 21:49Estimated read 8 min
BeautyBrain: Distilling Gemini's Reasoning Capabilities into 4B Open-Source Models to Build an Intelligent Beauty Brand Extraction System
1

Section 01

BeautyBrain Project Overview: Distilling Gemini's Reasoning into Open-Source Small Models for Beauty Brand Extraction

BeautyBrain is an efficient beauty brand extraction system that transfers Gemini's reasoning capabilities to Qwen 2.5 4B/7B open-source models via knowledge distillation. It achieves faster inference speed and higher accuracy than the original models, supporting automatic identification of beauty brands and their categories from social media content. This project addresses the cost and latency issues of using closed-source large model APIs while maintaining high performance.

2

Section 02

Background: Challenges of Traditional Beauty Brand Extraction Methods

In beauty industry social media analysis, brand extraction is critical. Traditional methods have limitations:

  1. Rule matching: Relies on large brand dictionaries but fails to handle variants (e.g., SK-II/SK2/sk-ii), emerging brands, or context ambiguity.
  2. Closed-source API calls (Gemini/GPT-4): High accuracy but high cost, latency, and privacy concerns. For real-time processing of massive social media content, pure API solutions are often too expensive. BeautyBrain aims to transfer reasoning capabilities to locally deployed small models while keeping high accuracy.
3

Section 03

Core Approach: Knowledge Distillation + Multi-Task Learning

BeautyBrain uses knowledge distillation to transfer Gemini 2.5 Flash's reasoning to Qwen 2.5 models, with a multi-task learning architecture optimizing 5 goals:

  1. BIO sequence tagging: Precisely identify brand boundaries (e.g., "Love this SK-II essence" → [O,O,B-brand,I-brand,O]).
  2. Brand count prediction: Predict number of brands (0/1/2/3+) to understand context complexity.
  3. Span extraction & attention pooling: Weighted pooling of brand span tokens via attention for better semantic representation.
  4. Multi-brand interaction modeling: Use multi-head attention to model relationships between multiple brands (e.g., SK-II vs La Mer).
  5. Knowledge base alignment: Contrastive learning aligns extracted brands with standard KB entries for alias normalization (SK2→SK-II) and category consistency.
4

Section 04

Training Strategy & Deployment Optimization

Data Pipeline:

  1. Collect 5000 TikTok posts → 2. Gemini 2.5 Flash labels →3. Manual correction (MTurk + internal team) →4. Final 4500 training data. Training Stages:
  • Warmup (0-0.5 epochs): Linear LR from1e-4→5e-4.
  • Stable phase (0.5-3): Train LoRA only, freeze base model.
  • Progressive unfreezing (3-4): Unfreeze last6 Transformer layers, LR5e-4→2e-4.
  • Fine-tuning (4-5): Full LoRA + task heads, LR2e-4. LoRA Config: r=128, target modules q_proj/v_proj → reduces trainable params to <0.1% of full fine-tuning. Quantization: AWQ4-bit reduces model size from8GB→2.5GB with <3% accuracy loss.
5

Section 05

Performance Comparison: BeautyBrain vs Original Models

Tested on RTX3060 (batch=1,1000 samples):

Metric Gemini2.5 Flash Qwen2.54B (Original) BeautyBrain(AWQ4-bit)
Beauty Detection F1 0.82 0.71 0.87
Brand Extraction EM 0.74 0.58 0.84
Category Accuracy 0.81 0.69 0.86
Inference Latency ~2.1s ~0.8s ~0.35s
Model Size API 8GB 2.5GB
BeautyBrain outperforms both the teacher (Gemini) and original Qwen models, with 6x faster inference and smaller size.
6

Section 06

Practical Application Scenarios

BeautyBrain supports multiple use cases:

  1. Social media monitoring: Real-time analysis of TikTok/Instagram/Xiaohongshu to extract brands and generate brand voice reports.
  2. Competitor analysis: Identify competing brands (e.g., SK-II vs La Mer) to analyze market positioning.
  3. Trend discovery: Monitor emerging brand mentions to detect early market trends.
  4. User profiling: Combine brand extraction with user behavior data for precise interest portraits.
7

Section 07

Limitations & Future Outlook

Current Limitations:

  • Mainly supports English; Chinese/Japanese/Korean brand support is under development.
  • Dependent on predefined KB; new brands require KB updates.
  • Batch processing optimization needed. Future Plans:
  • Support Instagram Reels/YouTube Shorts.
  • Multi-language brand extraction (Chinese/Korean/Japanese).
  • Kafka-based real-time streaming inference.
  • Human-machine collaborative Web UI for correction.
8

Section 08

Conclusion & Key Takeaways

BeautyBrain is a "large model capability downsizing" case: distilling closed-source LLM reasoning into open-source small models, balancing effect and cost/latency. For enterprises needing local NLP deployment, the "distillation+quantization+LoRA" combo is a valuable reference. Even 4B models can exceed commercial API performance in specific domains with careful design. The project is open-source under MIT license, including full training code, inference framework, and deployment scripts—ideal for vertical domain LLM applications.