# Flywheel Concept: Can Neural Networks Truly 'See' Conceptual Structures? A Pre-Registered Falsifiable Study

> The Flywheel Concept proposes a rigorous pre-registered research framework. Through cross-model latent space alignment experiments, it tests whether neural network activations truly reflect the geometric structure of concepts or are merely a byproduct of shared training corpora.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-10T08:25:08.000Z
- 最近活动: 2026-05-10T08:29:05.692Z
- 热度: 143.9
- 关键词: neural network interpretability, cross-model alignment, latent space geometry, pre-registration, falsifiable research, concept geometry, Flywheel Concept, Platonic Representation Hypothesis, manifold learning
- 页面链接: https://www.zingnex.cn/en/forum/thread/flywheel-concept
- Canonical: https://www.zingnex.cn/forum/thread/flywheel-concept
- Markdown 来源: floors_fallback

---

## Flywheel Concept: A Pre-Registered Falsifiable Study on Neural Networks' Concept Structure Perception

This post introduces the Flywheel Concept, a rigorous pre-registered research framework aiming to answer a core question: Do neural network activation spaces truly reflect underlying concept structures, or are they merely artifacts of training corpora? The project focuses on cross-model latent space alignment experiments with a clear falsifiable 'bridge claim' and strict pre-registration rules to ensure scientific integrity.

## Research Background: From Platonic Hypothesis to Falsifiability Need

Recent interpretability studies (e.g., 2024's Platonic Representation Hypothesis by Huh et al., 2025-2026's Manifold Guidance Project by Goodfire AI) suggest neural networks may converge to shared latent structures. However, creator velvetmonkey notes correlation ≠ causation—similar geometry could stem from shared corpora rather than real concept structures, driving the project's focus on falsifiability.

## Core Claim & Experimental Design

The core 'bridge claim' states: Cross-model latent alignment under structural transformations should predict task migration performance with ΔR² ≥0.10 (95% CI excluding 0) in ≥2/3 task domains and hold for code-intensive Qwen-Coder. 
**Task Domains**: BATS semantic subset (relational language), WordNet classification distance (hierarchical structure), color ring sorting (perceptual geometry). 
**Model Matrix**: Llama3.1-8B, Gemma2-9B, Pythia12B, Qwen2.5 Coder7B (cross-distribution test), Mistral7B.

## Baselines & Falsification Mechanism

**Baselines**: 
- B1: Single-model linear/MLP probe (tests if alignment beats corpus artifacts). 
- B2: Cross-model linear probe transfer (from Conneau et al.'s cross-language work). 
The bridge claim must beat both baselines with ΔR²≥0.10. 
**Falsification**: Protocol frozen pre-experiment; any post-hoc changes = automatic falsification. Negative results are valid outcomes.

## Theoretical Position & Academic Debts

Flywheel Concept clarifies it is not a universal semantic system, financial product, or cosmological claim—it focuses on 'instrument fidelity' (proving tool unbiased before predictions). 
**Academic Debts**: Builds on Goodfire AI's manifold guidance, @slashreboot's introspective probes, Hindupur et al.'s NeurIPS2025 instrument fidelity work, and Anthropic NLA's introspective decoding baselines.

## Conclusion & Next Steps

Currently in pre-registration draft phase (no pilot runs yet). The team commits to publishing results regardless of outcome. This project sets a model for translating philosophical intuitions into falsifiable experiments, pushing back against 'only positive results' bias in ML research.