Reading

su-memory SDK: Building a Local-First AI Memory System with Causal Reasoning Capabilities

su-memory SDK is a local-first AI memory framework. Using VectorGraphRAG, spacetime indexing, and causal graph technologies, it achieves an 87.8% multi-hop reasoning recall rate and a 96% latency reduction, providing LLM applications with true multi-hop causal reasoning capabilities.

AI记忆系统VectorGraphRAG因果推理本地优先多跳推理RAG向量数据库时序索引隐私保护LangChain

Published 2026-04-25 21:09Recent activity 2026-04-25 21:18Estimated read 6 min

su-memory SDK: Building a Local-First AI Memory System with Causal Reasoning Capabilities

Section 01

su-memory SDK Guide: Local-First AI Memory System with Causal Reasoning

su-memory SDK is a local-first AI memory framework that fills the gaps of traditional vector databases in causal reasoning, temporal awareness, and multi-hop association capabilities. Using VectorGraphRAG, spacetime indexing, and causal graph technologies, it achieves an 87.8% multi-hop reasoning recall rate and a 96% latency reduction, providing LLM applications with true multi-hop causal reasoning capabilities.

Section 02

Current Status and Challenges of AI Memory Systems

Most current AI applications' memory solutions are based on vector similarity nearest neighbor searches, which can only handle 'find similar' tasks and are ineffective at reasoning questions like 'why' or 'what will happen'. Traditional systems lack the core human memory capabilities of causal reasoning, temporal awareness, and multi-hop association—this is the core problem that su-memory SDK aims to solve.

Section 03

Analysis of Core Technical Architecture

su-memory SDK adopts a 'four-in-one' architecture:

VectorGraphRAG: Integrates vector retrieval and graph traversal, enabling efficient multi-hop reasoning using HNSW index (m=32, efConstruction=64, efSearch=64) and vector quantization (FP32/FP16/INT8/Binary);
SpacetimeIndex: Combines spatial location and temporal encoding, supporting spacetime multi-hop queries;
MemoryGraph: Explicitly defines four causal relationships (cause/condition/result/sequence) to enhance interpretability;
TemporalSystem: Implements temporal awareness, simulating the time decay characteristic of human memory.

Section 04

Performance Data and Engineering Practice

The project's released performance benchmark data:

Query latency: P50=19ms (96% reduction compared to pre-optimization), P95=76ms;
Throughput: 94 inserts per second, with ~10.66ms processing time per item;
Memory usage: 1.53MB for 1000 memories;
Multi-hop recall rate: 87.8% (46% improvement over baseline).

Section 05

Version Strategy and Application Scenarios

su-memory offers two versions: Lite and LitePro:

Lite: TF-IDF/N-gram retrieval, memory <5MB, suitable for prototype validation;
LitePro: Integrates Ollama bge-m3, supports full VectorGraphRAG, spacetime indexing, etc., memory <50MB, suitable for production environments. Application scenarios include long-term dialogue systems, knowledge management tools, predictive applications, and multimodal AI, and it is compatible with LangChain and VMC architectures.

Section 06

Business Model and Limitations

Licensing model: Free for individuals (limited to 1000 items), commercial paid (99 yuan/month up to 9999 yuan for private deployment). Limitations:

Scale limit: Enterprise version has an upper limit of 100,000 items, not suitable for large-scale document retrieval;
Ecosystem maturity: Community and toolchain are still under construction;
Dependency: LitePro requires Ollama to run local models, increasing deployment complexity.

Section 07

Summary and Selection Recommendations

su-memory represents the evolution of AI memory systems from storage retrieval to cognitive architecture. Its local-first approach, interpretability, and multimodal capabilities give it an advantage in privacy-sensitive and deep reasoning scenarios. It is recommended for AI application developers who need 'understanding/reasoning' rather than just 'matching/retrieval' to evaluate it. In the future, such local memory systems may become a standard component of next-generation AI applications.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23