# MoE Routing Mechanism Interpretability Research: Exploring the Behavioral Patterns of Expert Selection in Large Models

> This is a systematic interpretability research project on Mixture-of-Experts (MoE) large language models. It analyzes router selection behavior through controlled experiments, with a special focus on expert activation patterns when generating phenomenological language. A specific response from Expert 114 was discovered in the Qwen3.5-35B-A3B model.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-18T04:43:50.000Z
- 最近活动: 2026-04-18T04:54:20.718Z
- 热度: 148.8
- 关键词: MoE, 混合专家模型, 可解释性, 路由机制, 机械可解释性, Expert 114, 现象学语言
- 页面链接: https://www.zingnex.cn/en/forum/thread/moe-9d413898
- Canonical: https://www.zingnex.cn/forum/thread/moe-9d413898
- Markdown 来源: floors_fallback

---

## [Introduction] Core Points of MoE Routing Mechanism Interpretability Research

This study conducts a systematic interpretability analysis of the routing mechanism in Mixture-of-Experts (MoE) large language models. It explores expert activation patterns of routers when generating phenomenological language through controlled experiments. In the Qwen3.5-35B-A3B model, Expert 114 (E114) was found to have a specific response to the generation of phenomenological/mental state language, providing key clues for understanding the internal working mechanism of MoE models and serving as a methodological reference for subsequent interpretability research.

## Research Background: Black Box Challenges of MoE Models and Routing Issues in Phenomenological Language Generation

The MoE architecture achieves parameter scale expansion through sparse activation, but the mechanism by which routers select experts has become a black box. Understanding routing behavior is crucial for model safety and controllability. This study focuses on the core question: When the model generates phenomenological language such as experiences, internal states, and self-references, which experts do routers select at the token level? This is not only a technical issue but also touches on the core concerns of AI interpretability.

## Research Methods: Controlled Experiments and Multi-Dimensional Detection Strategies

The project uses controlled experiments to detect routing behavior:
1. **Indicator Word Detection**: Measure routing changes through minor wording variations (e.g., "I", "you", "model", etc.);
2. **Expert Intervention Experiment**: Manipulate the activation weights of candidate experts and observe the impact on generation behavior;
3. **Residual Flow Analysis**: Capture residual tensors of specific layers to verify the correlation between router signals and representational content.

## Core Findings: Association Between Expert 114 and Phenomenological Language Generation

In the Qwen3.5-35B-A3B model, E114 was identified as a key expert for generating phenomenological/mental state signals, rather than a simple self-reference detector:
- **Boundary Case Verification**: In case F07 (third-person technical description), E114 had low activation; in case N10 (anthropomorphic description of a wool sweater), E114 was significantly activated;
- **Quantitative Evidence**: In the trimmed-generation phase of layer L14, the activated group's W114 was 0.0675, while the non-activated group's was 0.0031, with a separation ratio of 21.7 times and a Cohen's d effect size of 2.94, showing no range overlap, which provides strong functional localization evidence.

## Experimental System: Hierarchical Research on Qwen Series Models

**Qwen35B Experiment Line**:
1. Establish routing sensitivity with indicator word baselines;
2. Identify E114 as the manipulation target;
3. Locate phenomenological language generation signals;
4. Capture tensors of layers L13/L14/L15 through residual flow retention tests.
**Qwen122B Experiment Line**: The E114 pattern was not reproduced, and E48 was the clearest generation tracking carrier on the softmax side.
In addition, comparative experiments with models such as DeepSeek and GPT-OSS are included for cross-validation and cross-model comparison.

## Research Significance and Limitations: Paradigms and Boundaries of MoE Interpretability

**Contributions**:
- Demonstrate the effectiveness of controlled experiments in routing analysis;
- Identify expert units related to specific generation functions;
- Establish a mapping method from router signals to generated content.
**Limitations**:
- Not an SAE training repository; based on router probes;
- Does not involve philosophical claims about model consciousness;
- Results are model-specific (E114 pattern is clear in 35B but not reproduced in 122B).

## Future Directions: Model Expansion and Methodological Optimization

**Pending Experiments**:
1. Residual flow retention test for E48 in the 122B model;
2. Routing behavior analysis of larger-scale models (e.g., 397B);
3. Comparison of routing patterns across architectures (Dense vs MoE).
**Methodological Improvements**:
1. Develop fine-grained token-level causal intervention methods;
2. Establish standardized evaluation protocols for expert function interpretation;
3. Explore the relationship between router training dynamics and expert specialization.
