Zing Forum

Reading

Practical Guide to Fine-Tuning Local Large Language Models: Detailed Explanation of LoRA and DoRA Parameter-Efficient Fine-Tuning Techniques

A complete guide to local LLM fine-tuning using the Qwen3-4B model, enabling LoRA and DoRA adapter training on consumer CPUs without a GPU, and exporting to GGUF format compatible with Ollama.

LoRADoRA参数高效微调Qwen3本地部署Ollama大语言模型PEFTCPU训练GGUF
Published 2026-06-03 04:44Recent activity 2026-06-03 04:48Estimated read 7 min
Practical Guide to Fine-Tuning Local Large Language Models: Detailed Explanation of LoRA and DoRA Parameter-Efficient Fine-Tuning Techniques
1

Section 01

【Introduction】Practical Fine-Tuning of Qwen3-4B on Local CPU: Detailed Explanation of LoRA and DoRA Techniques

This project, published by Hassan Butt on GitHub (link: https://github.com/Hassan-Butt4356/llm-finetuning-lora-dora), provides a complete guide to parameter-efficient fine-tuning of the Qwen3-4B-Instruct model using LoRA/DoRA on consumer CPUs without a GPU. It also supports exporting to GGUF format compatible with Ollama, significantly lowering the technical barrier for individual developers and small teams to fine-tune large models locally.

2

Section 02

Background: The Necessity of Parameter-Efficient Fine-Tuning

Traditional full fine-tuning of large models requires expensive GPU clusters, with extremely high demands on memory and storage, which hinders local fine-tuning by individuals and small teams. Parameter-Efficient Fine-Tuning (PEFT) techniques only train a small number of additional parameters, reducing resource requirements while maintaining performance. This project provides a local solution based on the PEFT approach.

3

Section 03

Comparison of LoRA and DoRA Technical Principles

LoRA Principle

Core idea: Weight changes during model fine-tuning have low intrinsic rank. Introduce low-rank matrices A (random Gaussian initialization) and B (zero initialization), output h=Wx+BAx, and only update A and B during training. Advantages: High parameter efficiency (saves ~98% parameters when rank r=8), modularity, composability.

DoRA Improvements

Decompose pre-trained weights into magnitude and direction components, and apply low-rank updates only to the direction component.

Comparison

Feature LoRA DoRA
Training Speed Baseline 10-15% slower
Small Dataset Quality Good Better
Large Dataset Quality Very Good Very Good
Memory Usage Less Slightly More

The project provides train_lora.py and train_dora.py scripts for selection.

4

Section 04

Complete Workflow and Hardware Requirements

Hardware Requirements

  • CPU: Intel Core Ultra7 255H or equivalent
  • Memory: 32GB (minimum 16GB)
  • Storage: ~30GB free space
  • System: Windows10/11 or Linux
  • Python3.10+ and Ollama installed

7-Step Workflow

  1. Download Qwen3-4B-Instruct base model (~8GB)
  2. Put PDFs into the my_pdfs/ directory; the script automatically extracts text to generate JSONL data (requires Ollama to run) 3-4. Run LoRA/DoRA training script (~20-40 minutes for 50 samples, ~5-10 hours for 500 samples)
  3. Merge adapter with base model (generate ~16GB complete model)
  4. Convert to GGUF format using llama.cpp
  5. Test the model to verify results
5

Section 05

Interpretation of Key Configuration Parameters

Adjustable parameters in config.py:

Parameter Default Value Description
LORA_RANK 8 Higher rank means larger capacity but slower training
EPOCHS 1 CPU training recommended to start with 1 epoch
MAX_SEQ_LEN 512 Reducing it speeds up CPU training
LEARNING_RATE 2e-4 Standard learning rate for LoRA
MAX_CHUNKS_PER_PDF 30 Increasing gives more training data
6

Section 06

Practical Application Value of the Project

  • Data Privacy: All processing is done locally; sensitive documents do not need to be uploaded to the cloud
  • Cost-Effectiveness: No need to rent GPU servers; existing hardware can customize models
  • Fast Iteration: Adapters are only ~50MB, easy for version management and deployment
  • Scalability: Modular design is easy to integrate into MLOps workflows
7

Section 07

Summary of Technical Points and Practical Recommendations

Key points to note for project reproduction:

  1. Memory Management: 32GB is a comfortable configuration; 16GB requires adjusting MAX_SEQ_LEN and BATCH_SIZE
  2. Data Quality: Text extraction quality from PDFs affects results; preprocessing to clean noise is recommended
  3. Iteration Strategy: First validate the process with small datasets, then expand to full data
  4. Training Monitoring: Observe loss curves and adjust learning rate timely

This project provides a blueprint for local LLM fine-tuning for individuals and small-to-medium teams, demonstrating the value of PEFT techniques in resource-constrained environments.