Zing Forum

Reading

Locally Run Large Model Fine-Tuning Practice: Efficient Training Scheme for Qwen3-4B Based on LoRA and DoRA

This article introduces a fully locally run large language model fine-tuning project that uses LoRA and DoRA technologies for parameter-efficient fine-tuning of Qwen3-4B-Instruct, enabling training on consumer-grade CPUs without GPUs or cloud services.

LoRADoRAQwen3大模型微调参数高效微调PEFT本地化训练OllamaCPU训练
Published 2026-06-03 04:44Recent activity 2026-06-03 04:47Estimated read 5 min
Locally Run Large Model Fine-Tuning Practice: Efficient Training Scheme for Qwen3-4B Based on LoRA and DoRA
1

Section 01

Introduction: Qwen3-4B Fine-Tuning Practice on Local CPUs Using LoRA/DoRA

This project is maintained by Hassan Butt and hosted on GitHub (Project link: https://github.com/Hassan-Butt4356/llm-finetuning-lora-dora). It aims to achieve local training of the Qwen3-4B-Instruct model on consumer-grade CPUs using two parameter-efficient fine-tuning (PEFT) technologies—LoRA and DoRA—without requiring GPUs or cloud services, thereby lowering the barrier for individual developers to customize large models.

2

Section 02

Background: The Necessity of Parameter-Efficient Fine-Tuning (PEFT)

As the number of parameters in large models grows exponentially, full fine-tuning incurs extremely high costs and hardware barriers. PEFT technology freezes most parameters of the base model and only trains a small number of newly added parameters, achieving results comparable to full fine-tuning. This project focuses on applying LoRA and DoRA technologies to fine-tune Qwen3-4B-Instruct on consumer-grade CPUs, allowing individual developers to experience large model customization.

3

Section 03

Technical Principles: Core Mechanisms of LoRA and DoRA

LoRA approximates weight updates by introducing low-rank matrices A and B, with the formula h=Wx+BAx. Its advantages include low memory usage, fast training speed, flexible model switching, and no inference delay. DoRA decomposes weights into magnitude and direction based on LoRA, with the formula W'=m*(W+BA)/||W+BA||. Its advantages are more stable training dynamics and better performance on small datasets, but it is 10-15% slower in training and has slightly higher memory usage.

4

Section 04

Project Practice: Full Workflow from Data Preparation to Model Deployment

The project workflow includes: 1. Environment preparation: Python 3.10+ and Ollama, download Qwen3-4B-Instruct weights; 2. Data preprocessing: automatically extract PDF text and convert to JSONL format; 3. Training configuration: adjustable parameters such as LORA_RANK=8, EPOCHS=1, etc. Training 50 samples on Intel Core Ultra7 255H takes 20-40 minutes; 4. Model export: merge adapters into GGUF format and call via Ollama local API.

5

Section 05

Technical Comparison: Feature Differences Between LoRA and DoRA and Selection Recommendations

Feature LoRA DoRA
Training Speed Baseline 10-15% slower
Small Dataset Quality Good Better
Large Dataset Quality Very Good Very Good
Memory Usage Lower Slightly Higher
Implementation Complexity Simple Moderate
Selection Recommendations: Choose LoRA for speed; choose DoRA if data volume is limited and high quality is required.
6

Section 06

Practical Significance: Application Scenarios of Local Fine-Tuning

This project lowers the threshold for large model fine-tuning. Application scenarios include: personal knowledge bases (converting notes/papers into intelligent assistants), enterprise document Q&A (internal systems), educational assistance (subject-customized models), and privacy protection (local processing of sensitive data).

7

Section 07

Summary and Outlook: Future Directions of PEFT Technology

LoRA and DoRA are mainstream directions in PEFT. This project demonstrates their implementation on consumer-grade hardware. With future technological advancements, it is expected to run and fine-tune larger-scale models on personal devices. This project is an excellent starting point for developers to deeply understand large model training.