Zing Forum

Reading

Practical Guide to Fine-Tuning a Medical Triage Large Model: A Complete MLOps Pipeline Based on Qwen3-1.7B

This article introduces a complete medical triage large model fine-tuning project using the Qwen3-1.7B base model. It performs supervised fine-tuning (SFT) via QLoRA, aligns with human preferences through DPO, and finally deploys as a vLLM+FastAPI inference service. The project covers the entire workflow from data pipeline, training, evaluation to CI/CD deployment.

医疗AI大模型微调QLoRADPOQwen3MLOpsvLLMFastAPI医疗分诊
Published 2026-05-26 17:14Recent activity 2026-05-26 17:21Estimated read 7 min
Practical Guide to Fine-Tuning a Medical Triage Large Model: A Complete MLOps Pipeline Based on Qwen3-1.7B
1

Section 01

Introduction / Main Floor: Practical Guide to Fine-Tuning a Medical Triage Large Model: A Complete MLOps Pipeline Based on Qwen3-1.7B

This article introduces a complete medical triage large model fine-tuning project using the Qwen3-1.7B base model. It performs supervised fine-tuning (SFT) via QLoRA, aligns with human preferences through DPO, and finally deploys as a vLLM+FastAPI inference service. The project covers the entire workflow from data pipeline, training, evaluation to CI/CD deployment.

3

Section 03

Project Background and Objectives

Medical triage is a key component in hospital emergency workflows, which requires quickly assessing the urgency level (immediate/medium/delayed) based on patients' described symptoms. Traditional manual triage relies on experienced nurses, but AI-assisted triage can significantly boost efficiency when medical resources are strained. This project, initiated by Centre Hospitalier Saint-Aurélien (CHSA), aims to build an AI assistant capable of processing both English and French patient descriptions and automatically classifying urgency levels. The project uses the Apache 2.0 open-source license and fully showcases the end-to-end implementation from data preparation to production deployment.

4

Section 04

Technical Architecture Overview

The entire system adopts a layered architecture design, divided into three main modules: data pipeline, training process, and deployment service:

Data Layer: Integrates four public medical Q&A datasets (MediQAL MCQU, FrenchMedMCQA, MedQuAD, UltraMedical). After cleaning and anonymization, it generates 5000 SFT training samples and 5000 DPO preference alignment samples.

Training Layer: Uses Qwen3-1.7B-Base as the foundation model. First, it performs 4-bit quantized supervised fine-tuning via QLoRA (LoRA rank set to 16), then aligns with human preferences through DPO (Direct Preference Optimization). The training process uses MLflow for experiment tracking, and model weights are stored in Google Cloud Storage.

Inference Layer: The merged complete model is deployed via vLLM, supporting continuous batching and PagedAttention optimization, and provides a FastAPI REST interface externally. The entire service is containerized and deployed on a GCP virtual machine, with CI/CD automation implemented via GitHub Actions.

5

Section 05

Efficient Fine-Tuning with QLoRA

QLoRA (Quantized Low-Rank Adaptation) is one of the core technologies of this project. By adding low-rank adapters to the 4-bit Normal Float quantized base model, training can be completed on a single GPU with 16GB VRAM (such as T4, L4). Compared to full-parameter fine-tuning, QLoRA reduces VRAM usage by approximately 75% while maintaining good fine-tuning results.

6

Section 06

DPO Preference Alignment

Traditional RLHF (Reinforcement Learning from Human Feedback) requires training a reward model, which has a complex process. DPO learns directly from preference data, transforming the problem into a simple classification task and greatly simplifying the implementation. The DPO data in the project uses the triplet format (question, preferred answer, non-preferred answer) from the UltraMedical-Preference dataset.

7

Section 07

DVC Data Version Control

Medical data involves privacy and compliance requirements, so the project uses DVC (Data Version Control) to manage the data pipeline. From raw data download to final training set generation, 6 processing stages are defined (clean → anonymize → tokenize → split). Any parameter change will automatically trigger the re-execution of the corresponding stage.

8

Section 08

Training Results and Evaluation

In the SFT phase, the loss on the training set dropped to 1.112, and the validation set loss was 1.189, indicating good model convergence. The project includes 70 unit tests covering the data pipeline, API interfaces, and model inference logic. Currently, the project is in the 4th week of the deployment phase, the API service is ready, and final production environment verification is underway. The DPO-aligned model and the complete technical report are also being developed in parallel.