# MTP-D: Self-Distillation Boosts Multi-Token Prediction, Achieving 220% Inference Acceleration

> MTP-D uses self-distillation to increase the acceptance rate of multi-token prediction heads by 7.5%, and its looped extension strategy achieves 220.4% inference acceleration compared to single-head MTP, providing new ideas for optimizing LLM inference efficiency.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-25T04:00:29.000Z
- 最近活动: 2026-03-27T05:22:35.145Z
- 热度: 86.6
- 关键词: 多token预测, 自蒸馏, 推理加速, 大语言模型, 推理效率
- 页面链接: https://www.zingnex.cn/en/forum/thread/mtp-d-token-220
- Canonical: https://www.zingnex.cn/forum/thread/mtp-d-token-220
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: MTP-D: Self-Distillation Boosts Multi-Token Prediction, Achieving 220% Inference Acceleration

MTP-D uses self-distillation to increase the acceptance rate of multi-token prediction heads by 7.5%, and its looped extension strategy achieves 220.4% inference acceleration compared to single-head MTP, providing new ideas for optimizing LLM inference efficiency.

## Background and Challenges

As the scale of large language models expands, **inference efficiency** has become a key bottleneck. Multi-token prediction (MTP) accelerates inference by predicting multiple future tokens in parallel, but it faces two major challenges:

1. Limited acceptance rate of MTP heads
2. Difficulty in joint training of multiple MTP heads

## MTP-D: Self-Distillation Solution

**Core Innovation**: A simple and efficient self-distillation method

- **Minimal additional training cost**
- **7.5% increase in MTP head acceptance rate**
- **Maximally preserves main head performance**

## Looped Extension Strategy

Introduce the looped extension strategy:
- Economically and efficiently expand MTP heads
- Achieve **220.4% inference acceleration** compared to single-head MTP

## Experimental Validation

Systematic exploration on seven benchmark tests:

- Key insights into distillation strategies
- Scalability potential of MTP

## Practical Value

This work effectively improves the performance and inference efficiency of MTP heads, promoting the practical application of MTP in LLMs.
