Reading

LLM Wiki Agent: Open-source Implementation of Karpathy's Wiki Mode, Supporting Offline Operation

A knowledge management agent built on the LLM Wiki mode proposed by Andrej Karpathy, using a medallion architecture for hierarchical storage, supporting hybrid deployment of local Ollama inference and cloud Gemini, and fully usable offline.

知识管理WikiOllama本地推理知识图谱ObsidianKarpathy

Published 2026-04-22 00:14Recent activity 2026-04-22 00:29Estimated read 5 min

Section 01

Introduction / Main Floor: LLM Wiki Agent: Open-source Implementation of Karpathy's Wiki Mode, Supporting Offline Operation

Section 02

Origin: Karpathy's Wiki Mode

LLM Wiki Agent directly originates from a design pattern shared by Andrej Karpathy. The core idea of this pattern is to organize the knowledge base into Wiki-style single-concept pages, establish connections between concepts via [[wiki links]], and use graph traversal as the navigation mechanism.

Unlike traditional document management systems, the Wiki mode emphasizes discretization and association of concepts. Each page carries only one core concept, forming a knowledge network with other concepts through links. This structure is naturally suitable for the understanding and reasoning of large language models.

Section 03

Core Architecture: Medallion Hierarchical Storage

The project implements a medallion architecture, dividing knowledge into three levels:

Section 04

Gold Layer (canon/)

A read-only core knowledge source containing verified authoritative information. The agent can read but will never overwrite content in this layer. In search ranking, Gold Layer results have the highest priority.

Section 05

Silver Layer (knowledge/wiki/)

A writable knowledge storage layer for the agent, used to save knowledge organized, summarized, and generated by the agent. This is the main place where the agent performs knowledge work, with priority in search results lower than the Gold Layer.

Section 06

Bronze Layer (knowledge/raw/)

Raw data source, including imported documents, web-scraped content, and other unprocessed information. As raw material for knowledge processing, it has the lowest search priority.

This layered design ensures knowledge quality control and traceability, preventing raw noise from contaminating the core knowledge base.

Section 07

Hybrid Inference: Free Switch Between Local and Cloud

A key highlight of the project is the flexible choice of inference modes, allowing users to freely combine based on privacy and performance needs:

Usage Scenario	Dialogue Inference	Embedding Vector	Network Requirement
Fully Local	Ollama (Local)	Ollama (Local)	None
Hybrid Mode	Ollama (Local/Cloud)	Gemini (Cloud)	Only for Embedding
Fully Cloud	Ollama Cloud	Gemini (Cloud)	Network Required

This design breaks the binary opposition of 'local must be slow' and 'cloud must leak', allowing users to choose the most suitable inference mode based on specific tasks. Sensitive content is processed using local models, while complex reasoning tasks can call cloud capabilities.

Section 08

Knowledge Graph: Six Edge Types

The project builds a rich knowledge graph, supporting six edge types to describe relationships between concepts:

SIMILAR: Concept Similarity
INTER_FILE: Inter-file Association
CROSS_DOMAIN: Cross-domain Connection
PARENT_CHILD: Hierarchical Subordination
REFERENCES: Citation Relationship
RELATES_TO: General Association

Each edge carries source information (link_text, link_kind, evidence), ensuring the interpretability and traceability of the graph.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49