Zing Forum

Reading

Flywheel Concept: Can Neural Networks Truly 'See' Conceptual Structures? A Pre-Registered Falsifiable Study

The Flywheel Concept proposes a rigorous pre-registered research framework. Through cross-model latent space alignment experiments, it tests whether neural network activations truly reflect the geometric structure of concepts or are merely a byproduct of shared training corpora.

neural network interpretabilitycross-model alignmentlatent space geometrypre-registrationfalsifiable researchconcept geometryFlywheel ConceptPlatonic Representation Hypothesismanifold learning
Published 2026-05-10 16:25Recent activity 2026-05-10 16:29Estimated read 4 min
Flywheel Concept: Can Neural Networks Truly 'See' Conceptual Structures? A Pre-Registered Falsifiable Study
1

Section 01

Flywheel Concept: A Pre-Registered Falsifiable Study on Neural Networks' Concept Structure Perception

This post introduces the Flywheel Concept, a rigorous pre-registered research framework aiming to answer a core question: Do neural network activation spaces truly reflect underlying concept structures, or are they merely artifacts of training corpora? The project focuses on cross-model latent space alignment experiments with a clear falsifiable 'bridge claim' and strict pre-registration rules to ensure scientific integrity.

2

Section 02

Research Background: From Platonic Hypothesis to Falsifiability Need

Recent interpretability studies (e.g., 2024's Platonic Representation Hypothesis by Huh et al., 2025-2026's Manifold Guidance Project by Goodfire AI) suggest neural networks may converge to shared latent structures. However, creator velvetmonkey notes correlation ≠ causation—similar geometry could stem from shared corpora rather than real concept structures, driving the project's focus on falsifiability.

3

Section 03

Core Claim & Experimental Design

The core 'bridge claim' states: Cross-model latent alignment under structural transformations should predict task migration performance with ΔR² ≥0.10 (95% CI excluding 0) in ≥2/3 task domains and hold for code-intensive Qwen-Coder. Task Domains: BATS semantic subset (relational language), WordNet classification distance (hierarchical structure), color ring sorting (perceptual geometry). Model Matrix: Llama3.1-8B, Gemma2-9B, Pythia12B, Qwen2.5 Coder7B (cross-distribution test), Mistral7B.

4

Section 04

Baselines & Falsification Mechanism

Baselines:

  • B1: Single-model linear/MLP probe (tests if alignment beats corpus artifacts).
  • B2: Cross-model linear probe transfer (from Conneau et al.'s cross-language work). The bridge claim must beat both baselines with ΔR²≥0.10. Falsification: Protocol frozen pre-experiment; any post-hoc changes = automatic falsification. Negative results are valid outcomes.
5

Section 05

Theoretical Position & Academic Debts

Flywheel Concept clarifies it is not a universal semantic system, financial product, or cosmological claim—it focuses on 'instrument fidelity' (proving tool unbiased before predictions). Academic Debts: Builds on Goodfire AI's manifold guidance, @slashreboot's introspective probes, Hindupur et al.'s NeurIPS2025 instrument fidelity work, and Anthropic NLA's introspective decoding baselines.

6

Section 06

Conclusion & Next Steps

Currently in pre-registration draft phase (no pilot runs yet). The team commits to publishing results regardless of outcome. This project sets a model for translating philosophical intuitions into falsifiable experiments, pushing back against 'only positive results' bias in ML research.