Reading

FindingLLMFeatures: Exploring Geometric Feature Representations in Large Language Models

An open-source project exploring multi-dimensional geometric feature representations in GPT-2 Small, aiming to discover geometric structures like circles and rings formed by periodic concepts in the activation space, providing a new perspective for AI interpretability research.

大语言模型可解释性特征工程几何表示GPT-2Transformer机器学习深度学习AI安全

Published 2026-05-08 09:44Recent activity 2026-05-08 10:34Estimated read 5 min

FindingLLMFeatures: Exploring Geometric Feature Representations in Large Language Models

Section 01

[Introduction] FindingLLMFeatures Project: Exploring Geometric Feature Representations in Large Language Models

FindingLLMFeatures is an open-source project exploring multi-dimensional geometric feature representations of large language models in GPT-2 Small. It aims to discover geometric structures like circles and rings formed by periodic concepts in the activation space, challenging the traditional assumption of linear representation and providing a new perspective for AI interpretability research.

Section 02

Background and Theoretical Basis

For a long time, the field of AI interpretability has assumed that large language models represent concepts using one-dimensional linear vectors, but recent studies have challenged this view. This project is based on two 2024 papers: Engels et al. proved that language models encode periodic concepts (such as the seven days of a week) using circles, and Marks et al. explored the linear structure of true/false datasets. Core hypothesis: The middle and later layers of GPT-2 Small encode periodic and relational data using circles, rings, or lattices.

Section 03

Research Methodology

A discovery-driven approach is adopted, with steps including: 1. Activation extraction: Use TransformerLens or nnsight to extract activation values from the residual stream of GPT-2 Small; 2. Manifold search: Clustering (e.g., K-Means) + PCA, focusing on clusters where the variances of the first two principal components are similar and high; 3. Validation: Compare linear probing with circular probing (fitting sinθ and cosθ). If the circular probing loss is lower, non-linear features are identified.

Section 04

Expected Findings and Challenges

Expected findings include circular representations of time-related concepts and star-shaped structures for linguistic categories (e.g., a central verb connecting tense conjugations). Challenges faced: 1. High-dimensional activation space (768 dimensions) makes search difficult; 2. After identifying geometric structures, automated annotation is needed to determine corresponding concepts; 3. Need to be alert to pseudo-geometric structures generated by Softmax or positional encoding.

Section 05

Technical Implementation and Toolchain

Based on the Python ecosystem, core tools include TransformerLens, nnsight (Transformer interpretability), scikit-learn (PCA/clustering), and matplotlib/plotly (visualization). Code modularization: Activation extraction module, geometric analysis module, probing validation module, and visualization module.

Section 06

Significance for AI Interpretability

If the widespread existence of multi-dimensional geometric structures is confirmed, it will change the research paradigm of AI interpretability. In practice, it can bring more precise intervention methods (current representation editing is based on linear assumptions) and provide a new theoretical basis for model compression and knowledge distillation.

Section 07

Conclusion and Outlook

FindingLLMFeatures represents an important direction for AI interpretability from linear assumptions to geometric understanding. Although it is in the early stage, the methodology has opened up new possibilities. More discoveries of geometric properties in the future will help understand the working principles of AI and provide key insights for building safer and more controllable AI systems.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15