Section 01
[Introduction] Thinking-Bert: Exploring a Hierarchical Reasoning Architecture for Small Models to Achieve "Deep Thinking"
This project is an experimental exploration aimed at enabling small encoder models (only 256 dimensions and 8 layers) to achieve "thinking" capabilities similar to large reasoning models. By integrating a two-layer iterative processing mechanism and Adaptive Computation Time (ACT) technology, it verifies the reasoning potential of lightweight models and provides new possibilities for resource-constrained scenarios (such as edge devices and mobile terminals).