Zing Forum

Reading

FindingLLMFeatures: Exploring Geometric Feature Representations in Large Language Models

An open-source project exploring multi-dimensional geometric feature representations in GPT-2 Small, aiming to discover geometric structures like circles and rings formed by periodic concepts in the activation space, providing a new perspective for AI interpretability research.

大语言模型可解释性特征工程几何表示GPT-2Transformer机器学习深度学习AI安全
Published 2026-05-08 09:44Recent activity 2026-05-08 10:34Estimated read 5 min
FindingLLMFeatures: Exploring Geometric Feature Representations in Large Language Models
1

Section 01

[Introduction] FindingLLMFeatures Project: Exploring Geometric Feature Representations in Large Language Models

FindingLLMFeatures is an open-source project exploring multi-dimensional geometric feature representations of large language models in GPT-2 Small. It aims to discover geometric structures like circles and rings formed by periodic concepts in the activation space, challenging the traditional assumption of linear representation and providing a new perspective for AI interpretability research.

2

Section 02

Background and Theoretical Basis

For a long time, the field of AI interpretability has assumed that large language models represent concepts using one-dimensional linear vectors, but recent studies have challenged this view. This project is based on two 2024 papers: Engels et al. proved that language models encode periodic concepts (such as the seven days of a week) using circles, and Marks et al. explored the linear structure of true/false datasets. Core hypothesis: The middle and later layers of GPT-2 Small encode periodic and relational data using circles, rings, or lattices.

3

Section 03

Research Methodology

A discovery-driven approach is adopted, with steps including: 1. Activation extraction: Use TransformerLens or nnsight to extract activation values from the residual stream of GPT-2 Small; 2. Manifold search: Clustering (e.g., K-Means) + PCA, focusing on clusters where the variances of the first two principal components are similar and high; 3. Validation: Compare linear probing with circular probing (fitting sinθ and cosθ). If the circular probing loss is lower, non-linear features are identified.

4

Section 04

Expected Findings and Challenges

Expected findings include circular representations of time-related concepts and star-shaped structures for linguistic categories (e.g., a central verb connecting tense conjugations). Challenges faced: 1. High-dimensional activation space (768 dimensions) makes search difficult; 2. After identifying geometric structures, automated annotation is needed to determine corresponding concepts; 3. Need to be alert to pseudo-geometric structures generated by Softmax or positional encoding.

5

Section 05

Technical Implementation and Toolchain

Based on the Python ecosystem, core tools include TransformerLens, nnsight (Transformer interpretability), scikit-learn (PCA/clustering), and matplotlib/plotly (visualization). Code modularization: Activation extraction module, geometric analysis module, probing validation module, and visualization module.

6

Section 06

Significance for AI Interpretability

If the widespread existence of multi-dimensional geometric structures is confirmed, it will change the research paradigm of AI interpretability. In practice, it can bring more precise intervention methods (current representation editing is based on linear assumptions) and provide a new theoretical basis for model compression and knowledge distillation.

7

Section 07

Conclusion and Outlook

FindingLLMFeatures represents an important direction for AI interpretability from linear assumptions to geometric understanding. Although it is in the early stage, the methodology has opened up new possibilities. More discoveries of geometric properties in the future will help understand the working principles of AI and provide key insights for building safer and more controllable AI systems.