# CogLang Drug Distill: Edge-side Large Model Distillation and Syntax-Constrained Secure Query System

> This project demonstrates how to distill large model knowledge into a small model with 1.5B parameters, enabling CogLang query generation for drug safety graphs on devices with 1GB memory via QLoRA fine-tuning, GGUF quantization, and GBNF syntax constraints.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-01T08:44:44.000Z
- 最近活动: 2026-06-01T08:54:27.940Z
- 热度: 163.8
- 关键词: 模型蒸馏, 端侧推理, QLoRA, GGUF, GBNF, CogLang, 知识图谱, 语法约束, 量化, 边缘AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/coglang-drug-distill
- Canonical: https://www.zingnex.cn/forum/thread/coglang-drug-distill
- Markdown 来源: floors_fallback

---

## Introduction: Core Overview of the CogLang Drug Distill Project

This project was published by zhpy2004 on GitHub (link: https://github.com/zhpy2004/coglang-drug-distill). It demonstrates how to distill large model knowledge into a small model with 1.5B parameters, enabling CogLang query generation for drug safety graphs on devices with 1GB memory via QLoRA fine-tuning, GGUF quantization, and GBNF syntax constraints. The project connects the full pipeline from data generation to edge-side deployment, providing a reusable blueprint for edge AI applications.

## Project Background: Challenges and Needs of Edge AI

With the improvement of LLM capabilities, edge-side deployment has become a hot topic, but cloud-based inference has issues like high latency, privacy risks, and offline unavailability. Running large models on edge devices faces four major challenges: memory constraints (only 1-2GB allocated to AI on mobile phones), computing power limitations (mobile hardware is weaker than data centers), power consumption constraints (continuous inference drains battery quickly), and safety constraints (zero hallucinations and auditability required in the medical field). This project provides a technical demonstration addressing these challenges.

## Core Methods: Teacher-Student Distillation Pipeline and CogLang Design

The core architecture is a teacher-student distillation pipeline:
1. Teacher models (e.g., DeepSeek) generate self-validated question-answer pairs in the drug domain;
2. QLoRA fine-tuning: Freeze original model weights, train low-rank adapters—finetuning a 1.5B model takes 8 minutes with 8GB VRAM;
3. GGUF quantization: Convert to Q4_K_M format, model size is 935MiB (≈1GB);
4. GBNF syntax constraints: Force output to comply with CogLang syntax, ensuring grammatical correctness, simplifying semantic learning, and supporting security audits.
CogLang is a graph-first intermediate language that natively supports node/edge operations and has built-in audit functions.

## Domain Application: Drug Safety Knowledge Graph and Zero-Hallucination Design

The drug safety domain (high risk, structured data, complex queries) is chosen, and a drug interaction graph with 20 nodes and 21 edges is built. Safety design: The small model only generates CogLang queries; facts are stored in the graph, and answers are retrieved by the query engine. The neuro-symbolic hybrid architecture eliminates hallucination risks.

## Experimental Results: Performance and Safety Verification

Experimental data:
| Evaluation Scenario | Configuration | Pass Rate |
|---|---|---|
| OOD Test Set | Q4+GBNF+SYSTEM_v6d | 80% (8/10) |
| Real User Prompts (fewshot_v0) | + Few-shot | ~47% (9/19) |
| Real User Prompts (fewshot_v1) | + Optimized Few-shot | ~68% (13/19) |
| Dangerous Write | Protection + Few-shot | 0 |
Key findings: Importance of prompt engineering; significant improvement from optimized few-shot; safety guardrails effectively block dangerous operations.

## Technical Details: Training Experience and Mobile Deployment

Training experience:
1. For chat-format SFT, completion_only_loss=True must be set;
2. Teacher-forcing eval_loss has low reference value in small data scenarios;
3. Prioritize prompt engineering over retraining.
Mobile deployment: Verified on Android (Termux+llama.cpp) with a speed of ~27 tokens/sec. Detailed explanation on choosing llama.cpp over MLC-LLM is provided.

## Limitations and Future Directions

Limitations: This project is a learning/research demonstration; the drug graph is not authoritative, and a medical disclaimer is included. Future plans: Retrain v6 to fix failure modes; support multi-round Agent loops to handle reasoning-based queries.

## Conclusion: Pragmatic Engineering Paradigm for Edge AI

The project demonstrates an engineering paradigm for edge AI: connecting the full pipeline. Core values include reproducibility (complete code and logs), measurability (clear metrics), safety (zero hallucinations), and practicality (mobile verification). It is an excellent reference implementation for LLM edge-side deployment.