Reading

micro-kiki: Innovative Practice of 35 Domain Expert LoRAs and Cognitive Layer Architecture

A multi-domain expert system built on Qwen3.6-35B-A3B, which achieves precise reasoning and continuous learning in professional fields through 35 LoRA adapters and a three-layer cognitive architecture consisting of Aeon Memory, CAMP Negotiation, and KnowBias Anti-bias.

LoRAMoEQwen领域专家MLX多模态路由认知架构灾难性遗忘量化推理开源模型

Published 2026-04-21 00:44Recent activity 2026-04-21 00:51Estimated read 6 min

Section 01

micro-kiki: Innovative Practice of 35 Domain Expert LoRAs and Cognitive Layer Architecture

micro-kiki is a multi-domain expert system built on Qwen3.6-35B-A3B, which achieves precise reasoning and continuous learning in professional fields through 35 LoRA adapters and a three-layer cognitive architecture (Aeon Memory, CAMP Negotiation, KnowBias Anti-bias). This article will introduce it from aspects such as background, architecture, training, and deployment.

Section 02

Project Background and Core Positioning

micro-kiki is the deployment result of the dreamOfkiki research project under Hypneum Lab, led by Clément Saillant. Its core goal is to build an AI system capable of handling 35 professional domains. The base model selected is Qwen3.6-35B-A3B, whose MoE architecture (256 experts with only 3 billion parameters activated) balances efficiency and capacity, supporting an ultra-long context of 262,000 tokens.

Section 03

Innovation of Three-Layer Cognitive Architecture

micro-kiki introduces a three-layer cognitive architecture:

MetaRouter: The Sigmoid classifier supports multi-domain activation (up to 4 adapters), routes based on semantic features, and handles cross-domain problems;
Aeon Memory System: A dual-storage architecture (Atlas semantic memory, Trace graph-structured memory) that maintains context coherence in multi-turn dialogues, with an average recall of over 36 times in 14 rounds of dialogue in actual tests;
CAMP Negotiation and KnowBias Filtering: Coordinates multi-expert opinions to prevent groupthink, and ensures neutral and professional output through bias detection and framework deconstruction.

Section 04

Technical Details of LoRA Adapter Training

Optimal LoRA training configuration: 32 layers (out of 40 total), rank=16/alpha=16, learning rate 1e-5, 100-1000 iterations. Hardware requires Mac Studio M3 Ultra 512GB (BF16 training peak memory 107GB). Forgetting gate mechanism: Trigger rollback when the cosine similarity between the new adapter and existing adapters is <30 degrees and the win rate drops by more than 3% to prevent catastrophic forgetting.

Section 05

Verified Domain Coverage

Currently, 10 SFT domain adapters have been trained, with partial domain data as follows:

Domain	Number of Training Samples	Final Loss	Typical Scenario
kicad-dsl	694	0.42	PCB design
spice-sim	368	0.38	Circuit simulation
stm32	711	0.44	Firmware development
electronics	1900	0.43	General electronic engineering
Among them, the four domains of SPICE, STM32, electronics, and DSP have passed the forgetting gate test, with good cross-domain compatibility.

Section 06

Deployment and Inference Solutions

Two deployment solutions are provided:

Mac Studio: MLX framework, Q4_K_M quantization, set memory/cache limits to avoid GPU suspension;
RTX4090: vLLM's AWQ quantization, 24GB memory can load the base model + 2-4 adapters, inference speed 30-50 tokens/second. Consumer-grade graphics cards are not recommended for training (requires over 100GB memory).

Section 07

Open Source Ecosystem and Related Projects

micro-kiki belongs to the Hypneum Lab ecosystem:

KIKI-Mac_tunner: Training execution and MLX pipeline;
nerve-wml: Neural protocol advisor bridging;
dream-of-kiki: Sister project for dream-like knowledge integration. The dataset (489K samples), lightweight version (4B), and full version (35B including adapters) models have been released on Hugging Face.

Section 08

Project Summary and Value

micro-kiki proves that through LoRA combination, intelligent routing, and cognitive architecture, deep coverage of multiple domains can be achieved on consumer-grade hardware. Its forgetting gate and bias filtering mechanisms provide methodologies for domain expert model development, which are of reference value to engineers and researchers in AI deployment in technical fields and are worth paying attention to and participating in.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49