MBT (Metacognitive Behavioral Tuning) proposes a solution to the "forgetting" problem in multi-hop reasoning. Drawing on human metacognitive theory, it injects a five-stage metacognitive structure into the model's reasoning trajectory:
- Understanding & Filtering: Identify key information in the problem and filter out irrelevant distractions
- Planning: Formulate an overall strategy for multi-step reasoning
- Execution & Monitoring: Advance reasoning according to the plan while monitoring the validity of intermediate results
- Self-Correction: Adjust direction promptly when deviations are found
- Verification: Finally confirm the correctness and completeness of the answer
MBT provides two implementation methods:
MBT-S (Synthesis Mode)
Generate entirely new metacognitive reasoning trajectories from scratch, suitable for building training data from the ground up, and can generate high-quality demonstration trajectories based on teacher models.
MBT-R (Rewriting Mode)
Rewrite the student model's own reasoning trajectories into a metacognitive form, which is more efficient and directly uses existing model outputs to inject the metacognitive framework through structured rewriting.