Zing Forum

Reading

Practical Guide to Function Calling Fine-Tuning for Turkish Large Language Models: Optimization of the Kizagan Model Using Unsloth and QLoRA

A complete Jupyter Notebook tutorial demonstrating how to add function calling capabilities to a Turkish reasoning model using Unsloth and QLoRA technologies on an H100 GPU, covering the entire workflow from environment setup to model deployment.

Turkish LLMfunction callingUnslothQLoRAfine-tuningH100Flash AttentionLoRAcatastrophic forgettingreasoning model
Published 2026-05-26 16:46Recent activity 2026-05-26 16:52Estimated read 5 min
Practical Guide to Function Calling Fine-Tuning for Turkish Large Language Models: Optimization of the Kizagan Model Using Unsloth and QLoRA
1

Section 01

Introduction / Main Floor: Practical Guide to Function Calling Fine-Tuning for Turkish Large Language Models: Optimization of the Kizagan Model Using Unsloth and QLoRA

A complete Jupyter Notebook tutorial demonstrating how to add function calling capabilities to a Turkish reasoning model using Unsloth and QLoRA technologies on an H100 GPU, covering the entire workflow from environment setup to model deployment.

3

Section 03

Project Overview

This is a complete fine-tuning tutorial for Turkish large language models, aiming to add function calling capabilities to a base reasoning model. The project uses the Unsloth framework combined with QLoRA technology, optimized specifically for the NVIDIA H100 GPU, fully leveraging the bfloat16 and Flash Attention 2 features of the Hopper architecture.

The base model used is AlicanKiraz0/Kizagan-E4B-Turkish-Reasoning-Model, a Turkish model focused on reasoning capabilities. The fine-tuned model will have both function calling and general Turkish dialogue capabilities.


4

Section 04

Unsloth Framework

Unsloth is one of the most popular optimization frameworks in the field of large model fine-tuning currently. Its core advantages include:

  • Automatic Flash Attention 2 Activation: Automatically enabled on supported hardware, significantly improving training speed
  • 4-bit NF4 Quantization: Greatly reduces memory usage, making large model fine-tuning possible on consumer-grade hardware
  • bfloat16 Optimization: Natively compatible with H100, more stable than fp16
  • Gradient Checkpoint Integration: Unsloth provides a specially optimized version, balancing memory and speed
5

Section 05

QLoRA (Quantized Low-Rank Adaptation)

QLoRA is the mainstream method for Parameter-Efficient Fine-Tuning (PEFT) currently. Its working principle is:

  1. Freeze Base Model Parameters: Keep pre-trained weights unchanged to avoid catastrophic forgetting
  2. Inject Trainable Low-Rank Matrices: Add small adapters to key layers (q_proj, k_proj, v_proj, o_proj, etc.)
  3. 4-bit Quantized Storage: The base model is stored in NF4 format, greatly reducing memory requirements
  4. Dequantization During Training: Dynamically restore to bf16 during computation to ensure accuracy

The LoRA parameters configured in this project are r=32 and alpha=32, which are relatively high configurations and can learn more complex patterns.

6

Section 06

Flash Attention 2

Flash Attention is an IO-aware exact attention algorithm. Through block computation and recomputation strategies, it reduces the memory complexity of standard attention from O(N²) to O(N). On the H100, this can bring a 2-3x improvement in training speed.


7

Section 07

Dataset Strategy: Preventing Catastrophic Forgetting

The project uses a mixed dataset strategy, which is a key consideration in function calling fine-tuning:

8

Section 08

Dataset Composition

Dataset Ratio Purpose
Turkish Hermes Function Calling 85% Function calling training data, containing 144K Turkish prompt/completion pairs
Turkish LLM v10 Training 15% General Turkish dialogue data to maintain basic language capabilities