Reading

Locally Run Large Model Fine-Tuning Practice: Efficient Training Scheme for Qwen3-4B Based on LoRA and DoRA

This article introduces a fully locally run large language model fine-tuning project that uses LoRA and DoRA technologies for parameter-efficient fine-tuning of Qwen3-4B-Instruct, enabling training on consumer-grade CPUs without GPUs or cloud services.

LoRADoRAQwen3大模型微调参数高效微调PEFT本地化训练OllamaCPU训练

Published 2026-06-03 04:44Recent activity 2026-06-03 04:47Estimated read 5 min

Locally Run Large Model Fine-Tuning Practice: Efficient Training Scheme for Qwen3-4B Based on LoRA and DoRA

Section 01

Introduction: Qwen3-4B Fine-Tuning Practice on Local CPUs Using LoRA/DoRA

This project is maintained by Hassan Butt and hosted on GitHub (Project link: https://github.com/Hassan-Butt4356/llm-finetuning-lora-dora). It aims to achieve local training of the Qwen3-4B-Instruct model on consumer-grade CPUs using two parameter-efficient fine-tuning (PEFT) technologies—LoRA and DoRA—without requiring GPUs or cloud services, thereby lowering the barrier for individual developers to customize large models.

Section 02

Background: The Necessity of Parameter-Efficient Fine-Tuning (PEFT)

As the number of parameters in large models grows exponentially, full fine-tuning incurs extremely high costs and hardware barriers. PEFT technology freezes most parameters of the base model and only trains a small number of newly added parameters, achieving results comparable to full fine-tuning. This project focuses on applying LoRA and DoRA technologies to fine-tune Qwen3-4B-Instruct on consumer-grade CPUs, allowing individual developers to experience large model customization.

Section 03

Technical Principles: Core Mechanisms of LoRA and DoRA

LoRA approximates weight updates by introducing low-rank matrices A and B, with the formula h=Wx+BAx. Its advantages include low memory usage, fast training speed, flexible model switching, and no inference delay. DoRA decomposes weights into magnitude and direction based on LoRA, with the formula W'=m*(W+BA)/||W+BA||. Its advantages are more stable training dynamics and better performance on small datasets, but it is 10-15% slower in training and has slightly higher memory usage.

Section 04

Project Practice: Full Workflow from Data Preparation to Model Deployment

The project workflow includes: 1. Environment preparation: Python 3.10+ and Ollama, download Qwen3-4B-Instruct weights; 2. Data preprocessing: automatically extract PDF text and convert to JSONL format; 3. Training configuration: adjustable parameters such as LORA_RANK=8, EPOCHS=1, etc. Training 50 samples on Intel Core Ultra7 255H takes 20-40 minutes; 4. Model export: merge adapters into GGUF format and call via Ollama local API.

Section 05

Technical Comparison: Feature Differences Between LoRA and DoRA and Selection Recommendations

Feature	LoRA	DoRA
Training Speed	Baseline	10-15% slower
Small Dataset Quality	Good	Better
Large Dataset Quality	Very Good	Very Good
Memory Usage	Lower	Slightly Higher
Implementation Complexity	Simple	Moderate
Selection Recommendations: Choose LoRA for speed; choose DoRA if data volume is limited and high quality is required.

Section 06

Practical Significance: Application Scenarios of Local Fine-Tuning

This project lowers the threshold for large model fine-tuning. Application scenarios include: personal knowledge bases (converting notes/papers into intelligent assistants), enterprise document Q&A (internal systems), educational assistance (subject-customized models), and privacy protection (local processing of sensitive data).

Section 07

Summary and Outlook: Future Directions of PEFT Technology

LoRA and DoRA are mainstream directions in PEFT. This project demonstrates their implementation on consumer-grade hardware. With future technological advancements, it is expected to run and fine-tune larger-scale models on personal devices. This project is an excellent starting point for developers to deeply understand large model training.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49