Reading

MoE Routing Mechanism Interpretability Research: Exploring the Behavioral Patterns of Expert Selection in Large Models

This is a systematic interpretability research project on Mixture-of-Experts (MoE) large language models. It analyzes router selection behavior through controlled experiments, with a special focus on expert activation patterns when generating phenomenological language. A specific response from Expert 114 was discovered in the Qwen3.5-35B-A3B model.

MoE混合专家模型可解释性路由机制机械可解释性Expert 114现象学语言

Published 2026-04-18 12:43Recent activity 2026-04-18 12:54Estimated read 7 min

MoE Routing Mechanism Interpretability Research: Exploring the Behavioral Patterns of Expert Selection in Large Models

Section 01

[Introduction] Core Points of MoE Routing Mechanism Interpretability Research

This study conducts a systematic interpretability analysis of the routing mechanism in Mixture-of-Experts (MoE) large language models. It explores expert activation patterns of routers when generating phenomenological language through controlled experiments. In the Qwen3.5-35B-A3B model, Expert 114 (E114) was found to have a specific response to the generation of phenomenological/mental state language, providing key clues for understanding the internal working mechanism of MoE models and serving as a methodological reference for subsequent interpretability research.

Section 02

Research Background: Black Box Challenges of MoE Models and Routing Issues in Phenomenological Language Generation

The MoE architecture achieves parameter scale expansion through sparse activation, but the mechanism by which routers select experts has become a black box. Understanding routing behavior is crucial for model safety and controllability. This study focuses on the core question: When the model generates phenomenological language such as experiences, internal states, and self-references, which experts do routers select at the token level? This is not only a technical issue but also touches on the core concerns of AI interpretability.

Section 03

Research Methods: Controlled Experiments and Multi-Dimensional Detection Strategies

The project uses controlled experiments to detect routing behavior:

Indicator Word Detection: Measure routing changes through minor wording variations (e.g., "I", "you", "model", etc.);
Expert Intervention Experiment: Manipulate the activation weights of candidate experts and observe the impact on generation behavior;
Residual Flow Analysis: Capture residual tensors of specific layers to verify the correlation between router signals and representational content.

Section 04

Core Findings: Association Between Expert 114 and Phenomenological Language Generation

In the Qwen3.5-35B-A3B model, E114 was identified as a key expert for generating phenomenological/mental state signals, rather than a simple self-reference detector:

Boundary Case Verification: In case F07 (third-person technical description), E114 had low activation; in case N10 (anthropomorphic description of a wool sweater), E114 was significantly activated;
Quantitative Evidence: In the trimmed-generation phase of layer L14, the activated group's W114 was 0.0675, while the non-activated group's was 0.0031, with a separation ratio of 21.7 times and a Cohen's d effect size of 2.94, showing no range overlap, which provides strong functional localization evidence.

Section 05

Experimental System: Hierarchical Research on Qwen Series Models

Qwen35B Experiment Line:

Establish routing sensitivity with indicator word baselines;
Identify E114 as the manipulation target;
Locate phenomenological language generation signals;
Capture tensors of layers L13/L14/L15 through residual flow retention tests. Qwen122B Experiment Line: The E114 pattern was not reproduced, and E48 was the clearest generation tracking carrier on the softmax side. In addition, comparative experiments with models such as DeepSeek and GPT-OSS are included for cross-validation and cross-model comparison.

Section 06

Research Significance and Limitations: Paradigms and Boundaries of MoE Interpretability

Contributions:

Demonstrate the effectiveness of controlled experiments in routing analysis;
Identify expert units related to specific generation functions;
Establish a mapping method from router signals to generated content. Limitations:
Not an SAE training repository; based on router probes;
Does not involve philosophical claims about model consciousness;
Results are model-specific (E114 pattern is clear in 35B but not reproduced in 122B).

Section 07

Future Directions: Model Expansion and Methodological Optimization

Pending Experiments:

Residual flow retention test for E48 in the 122B model;
Routing behavior analysis of larger-scale models (e.g., 397B);
Comparison of routing patterns across architectures (Dense vs MoE). Methodological Improvements:
Develop fine-grained token-level causal intervention methods;
Establish standardized evaluation protocols for expert function interpretation;
Explore the relationship between router training dynamics and expert specialization.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15