Section 01
[Main Floor/Introduction] MLP Block Replacement: A New Module-Level Knowledge Distillation Approach for LLM Compression
A graduation thesis from Comenius University in Bratislava proposes a new approach to LLM compression that differs from quantization and pruning—treating MLP blocks in Transformers as independent functions and replacing them one by one with smaller approximate networks. This module-level knowledge distillation method opens up new possibilities for model compression, eliminating the need for end-to-end retraining of the entire model and offering advantages such as modularity, controllability, and interpretability.