Zing 论坛

正文

BRIAN-Sphere-LLM:可学习内部计算路径的潜在大语言模型路由框架

本文介绍BRIAN-Sphere-LLM项目,一个通过块级路由、块位置状态、终端输出动作和共享规范内存来学习组织内部计算路径的潜在大语言模型路由框架。

大语言模型Transformer动态路由自适应计算神经网络架构块路由BRIAN机器学习深度学习
发布时间 2026/06/11 15:14最近活动 2026/06/11 15:24预计阅读 7 分钟
BRIAN-Sphere-LLM:可学习内部计算路径的潜在大语言模型路由框架
1

章节 01

BRIAN-Sphere-LLM: A Latent LLM Routing Framework with Learnable Internal Computation Paths (Main Guide)

BRIAN-Sphere-LLM Overview

BRIAN-Sphere-LLM is a latent large language model routing framework designed to learn internal computation paths. Its core innovations include block-level routing, block position state, terminal output action, and shared canonical memory. The framework aims to replace the fixed middle-layer computation paths of traditional Transformers with a learnable latent routing graph, allowing dynamic adjustment of computation resources based on input complexity.

The project is maintained by Miocio-nora and hosted on GitHub (link: https://github.com/Miocio-nora/BRIAN), with ongoing development.

2

章节 02

Research Background: Limitations of Fixed Computation Paths in Transformers

Transformer's Fixed Path Dilemma

Since 2017, Transformers have been the standard for NLP, but their fixed layer sequence has limitations: simple and complex inputs both go through the same layers, leading to inefficient resource usage.

Early attempts to address this (e.g., Early Exit, Adaptive Depth Networks) are patches rather than fundamental redesigns of computation paths.

3

章节 03

Core Design & Architecture of BRIAN

Core Design & Architecture

Core Idea: Replace fixed middle layers with a routable block pool, enabling dynamic path selection (e.g., short paths for simple inputs, longer/cyclic paths for complex ones).

Dual State Routing:

  1. Content hidden state (H_r): Traditional Transformer semantic representation.
  2. Block position state (P_r): Records current position in the routing space, aiding path progress awareness.

Router actions: Choose internal blocks (B1-Bm) or OUT (exit to generate output).

BRIAN-R125 Configuration:

  • Total layers:12 (2 pre-blocks,8 route pool blocks,2 post-blocks).
  • Key parameters:768 hidden dim,12 attention heads,SwiGLU feedforward,RMSNorm,RoPE,32k vocab,2k initial context length.

Max routing steps:4-8; initial strategy:top-1 (later top-2 fusion).

4

章节 04

Phased Training Strategy for BRIAN

Phased Training Strategy

The project uses a 7-stage progressive training approach:

  1. Stage0: Train a standard fixed Transformer baseline.
  2. Stage1: Wrap middle layers into a route pool but force original path (verify no performance loss).
  3. Stage2: Train router to imitate predefined pseudo paths (learn navigation).
  4. Stage3: Gradually allow free routing (increase flexibility).
  5. Stage4: Enable OUT terminal action (model decides when to stop).
  6. Stage5: Add optional canonical global KV memory (long context support).
  7. Stage6: Experimental parallel latent transfer (beam search-style exploration).
5

章节 05

Evaluation Metrics & Current Implementation Status

Evaluation Metrics & Current Status

Key Metrics:

  • Basic: Validation loss, perplexity.
  • Routing-specific: Route entropy (decision uncertainty), block load entropy (load distribution), average steps (efficiency), difficulty-step correlation (critical: positive means more steps for harder inputs).

Current Implementation: v0.1 PyTorch scaffold includes:

  • Reproducible data packing, synthetic test data.
  • LLaMA-style decoder baseline, BRIAN routing core wrapper.
  • Implemented stages0-6 entry points, OUT action, global KV, parallel transfer.
  • Logging (JSONL), checkpoint management, route report generation.
  • B200-compatible conda environment.
6

章节 06

Practical Significance, Challenges & Future Outlook

Significance, Challenges & Outlook

Significance:

  • Efficiency: Adjust computation for input complexity.
  • Interpretability: Routing paths as "thinking process" evidence.
  • Adaptive reasoning: Real-time feedback adjustment.
  • New paradigm: Potential for next-gen adaptive neural architectures.

Challenges:

  • Training stability (discrete routing decisions).
  • Optimization complexity (joint routing and block parameter tuning).
  • Scalability (small to large model migration).
  • Evaluation complexity (new methods for routing quality).

Outlook: Early-stage project with bold design; worth continuous attention. Researchers can refer to the GitHub repo for technical docs and implementation guides.