Reading

Zerfoo: A Pure Go Machine Learning Framework with Zero CGo for High-Performance Inference and Training

Zerfoo is a machine learning framework implemented purely in Go, enabling ML model training, execution, and deployment without CGo. It supports 41 model architectures and offers advanced features like memory-mapped loading, EAGLE speculative decoding, QuaRot quantization, and hierarchical KV caching. On NVIDIA DGX Spark, it is 28% faster than Ollama.

Go机器学习ML推理GGUF量化LoRAKV缓存内存映射EAGLE

Published 2026-06-11 09:44Recent activity 2026-06-11 09:49Estimated read 10 min

Zerfoo: A Pure Go Machine Learning Framework with Zero CGo for High-Performance Inference and Training

Section 01

[Introduction] Zerfoo: Pure Go Zero-CGo ML Framework with Full Support for High-Performance Inference and Training

Zerfoo is a pure Go machine learning framework developed and maintained by zerfoo, released on GitHub on June 11, 2026 (link: https://github.com/zerfoo/zerfoo). Its core features include zero CGo dependency—no complex C language bindings are needed for model training, execution, and deployment; support for 41 model architectures covering multiple domains like text generation, speech recognition, and tabular data prediction; advanced features such as memory-mapped loading, EAGLE speculative decoding, QuaRot quantization, and hierarchical KV caching; and on NVIDIA DGX Spark hardware, some models achieve 28% faster inference speed than Ollama.

Section 02

[Background] Pain Points of ML in Go Ecosystem and Zerfoo's Solutions

When traditional Go developers integrate machine learning capabilities, they often rely on CGO bindings to Python/C++ libraries, leading to complex builds and cross-compilation difficulties. Zerfoo solves this problem through pure Go implementation: you can build ML-capable applications with just go build and no additional dependencies. The framework covers the entire workflow from training to deployment, supporting scenarios like text generation, embedding extraction, speech recognition, tabular data prediction, and time series prediction. Its design goal is to make ML development as simple as writing ordinary Go programs.

Section 03

[Core Technologies] Key Features and Implementation Methods of Zerfoo

Zerfoo's core technical features include:

Memory-mapped loading: By default, it uses mmap to load GGUF models, with on-demand paging, breaking through physical memory limits (e.g., a 128GB machine can run a 128.8GB MiniMax-M2 model);
EAGLE speculative decoding: Accelerates generation without a draft model, supporting training of custom EAGLE prediction heads;
QuaRot quantization and KV cache optimization: Hadamard rotation quantization (4-bit), three-layer KV cache (hot/warm/cold) for automatic long sequence management, and asynchronous prefetching to ensure performance;
TransMLA: Converts MHA/GQA to multi-head latent attention via SVD decomposition, reducing KV cache by over 80%;
Multi-LoRA service: A single base model supports multiple LoRA adapters, compatible with OpenAI API for per-request selection. Partial function code examples:

Memory-mapped loading: m, _ := zerfoo.Load("./MiniMax-M2-Q4_K_M-00001-of-00003.gguf")
EAGLE training: zerfoo eagle-train --model model.gguf --corpus data.txt --output eagle-head.gguf --epochs 5
TransMLA conversion: zerfoo transmla --rank 512 --input model.gguf --output model-mla.gguf

Section 04

[Performance & Models] Benchmarks and Supported Scope of Zerfoo

Performance: In tests on NVIDIA DGX Spark (Grace Blackwell architecture, 128GB unified memory), Zerfoo is faster than Ollama:

Model	Parameters	Quantization	Zerfoo (tok/s)	Ollama (tok/s)	Speedup
Gemma3 1B	1B	Q4_K_M	241	188	1.28x
DeepSeek R1 1.5B	1.5B	Q4_K_M	190	174	1.09x
Llama3.2 3B	3B	Q4_K_M	95	93	1.02x
Mistral7B	7B	Q5_K_M	46	45	1.02x
The advantages come from CUDA graph capture (99.5% of instructions) and fused kernel optimization.

Model Support: Covers 41 architectures across 25 families:

Mainstream text: Gemma3, Llama3/4, Mistral, etc.;
Efficient architectures: RWKV, Mamba, Jamba;
MoE models: MiniMax M2, DBRX, Kimi K2;
Multimodal: LLaVA, Qwen-VL, Whisper;
Specialized models: StarCoder2 (code), Granite series (time series). It also includes built-in tabular/time series models (e.g., MLP Ensemble, TFT) with a concise training interface: model := tabular.NewEnsemble[float32](engine, ...) trainer.Fit(ctx, trainX, trainY)

Section 05

[Usage & Deployment] Quick Start and Service Launch of Zerfoo

Basic Usage Examples:

Basic inference: m,_:=zerfoo.Load("google/gemma-3-4b"); response,_:=m.Chat("Explain Go interfaces in one sentence.")
Streaming generation: ch,err:=m.ChatStream(context.Background(),"Tell me a joke."); for tok:=range ch{fmt.Print(tok.Text)}
Structured JSON output: Specify a JSON schema via WithSchema to generate formatted results, example: schema := grammar.JSONSchema{Type: "object",Properties: map[string]*grammar.JSONSchema{"name": {Type: "string"},"age": {Type: "number"}},Required: []string{"name","age"}}
Tool calling: Define tool functions (e.g., get_weather), and the model will call them automatically.

Deployment Methods:

Install CLI: go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest
As a library: go get github.com/zerfoo/zerfoo
Start API service: zerfoo serve (compatible with OpenAI API, supports multi-LoRA, tool calling, etc.)

Section 06

[Practical Significance] Application Scenarios and Value of Zerfoo

Zerfoo fills a key gap in the ML field of the Go ecosystem, with its zero-CGo design bringing significant advantages:

Cloud-native deployment: Single binary file with no dependencies, suitable for containers/Serverless;
Edge computing: Memory mapping allows small-memory devices to run large models;
Enterprise applications: Structured output and tool calling support building reliable AI Agents;
Real-time systems: Streaming generation and low-latency inference are suitable for dialogue systems;
Multimodal applications: Built-in support for speech recognition and vision-language models.

Section 07

[Summary & Recommendations] Future Outlook and Usage Suggestions for Zerfoo

Summary: With its pure Go zero-CGo design, Zerfoo implements a full-featured ML framework while maintaining compatibility with the Python ecosystem (GGUF, HuggingFace). It provides Go developers with AI development capabilities without switching tech stacks, and offers ML engineers an efficient path to production deployment.

Recommendations:

Go developers: Try integrating ML features into existing applications using Zerfoo;
ML engineers: Use its zero-dependency feature to simplify model deployment;
Follow project updates: The GitHub repository is continuously maintained—track new features and optimizations.

Zerfoo: A Pure Go Machine Learning Framework with Zero CGo for High-Performance Inference and Training

[Introduction] Zerfoo: Pure Go Zero-CGo ML Framework with Full Support for High-Performance Inference and Training

[Background] Pain Points of ML in Go Ecosystem and Zerfoo's Solutions

[Core Technologies] Key Features and Implementation Methods of Zerfoo

[Performance & Models] Benchmarks and Supported Scope of Zerfoo

[Usage & Deployment] Quick Start and Service Launch of Zerfoo

[Practical Significance] Application Scenarios and Value of Zerfoo

[Summary & Recommendations] Future Outlook and Usage Suggestions for Zerfoo

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization