# Awesome-Efficient-Large-Models: A Comprehensive Resource Library for Large Model Compression and Acceleration Technologies

> A continuously updated curated list of academic papers that systematically organizes compression, acceleration, and efficient inference technologies for large language models (LLMs) and multimodal large models (MLLMs), covering core directions such as quantization, pruning, distillation, and architecture optimization.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-12T09:59:10.000Z
- 最近活动: 2026-05-12T10:23:47.235Z
- 热度: 154.6
- 关键词: 大模型压缩, 模型量化, 知识蒸馏, 高效推理, LLM加速, 多模态大模型, 模型剪枝, 稀疏注意力, 推测解码, 开源资源
- 页面链接: https://www.zingnex.cn/en/forum/thread/awesome-efficient-large-models
- Canonical: https://www.zingnex.cn/forum/thread/awesome-efficient-large-models
- Markdown 来源: floors_fallback

---

## [Main Floor/Introduction] Awesome-Efficient-Large-Models: A Comprehensive Resource Library for Large Model Compression and Acceleration Technologies

Maintained by the MAC-AutoML team, **Awesome-Efficient-Large-Models** is a continuously updated curated resource library of academic papers. It systematically organizes compression, acceleration, and efficient inference technologies for large language models (LLMs) and multimodal large models (MLLMs), covering core directions such as quantization, pruning, distillation, and architecture optimization. It provides cutting-edge references for researchers and engineers, with over 400 papers collected so far, making it one of the most influential open-source resources in this field.

## Project Background: The Urgency of Large Model Efficiency Optimization

As the parameter scale of LLMs and MLLMs grows exponentially, maintaining performance while reducing computational costs and improving inference efficiency has become a pressing challenge in the AI field. The Awesome-Efficient-Large-Models project emerged as a resource library that systematically collects and classifies cutting-edge research in this field. It is not just a list of papers but also a panoramic technology navigation map, covering the complete tech stack from compression and architectural innovation to inference acceleration.

## Core Technical Directions and Classification System

The resource library uses a multi-dimensional classification system, summarized into three core directions:
1. **Model Compression Technologies**: Quantization (INT8/INT4 such as GPTQ, AWQ), Pruning (structured/unstructured sparsification), Knowledge Distillation (white-box/black-box, etc.);
2. **Efficient Architecture Design**: Sparse Attention (e.g., Sparse Transformer), Mixture of Experts (MoE), Linear Attention and State Space Models (e.g., Mamba);
3. **Inference Acceleration and System Optimization**: Speculative Decoding, KV Cache Optimization (paged/quantized cache), Continuous Batching.

## Efficiency Challenges and Research Directions for Multimodal Large Models

Compared to text-only LLMs, MLLMs face more complex efficiency optimization issues. The resource library has a dedicated section for research related to vision-language models: lightweight design of visual encoders, efficient training for cross-modal alignment, acceleration of multimodal inference, and optimization for edge deployment, helping large models be applied in mobile/edge scenarios.

## Practical Value and Application Scenarios

Value for different roles:
- **Researchers**: Index of cutting-edge literature, clear classification for easy review, avoiding duplicate research;
- **Algorithm Engineers**: Open-source implementations and benchmark tests can be directly referenced for technology selection;
- **Product Teams**: Understand technical boundaries, balance model capabilities and deployment costs, and develop practical roadmaps.

## Community Contribution and Sustainable Development

The project uses the MIT open-source license and welcomes community contributions, with a standardized contribution process (paper submission template, classification standards, quality review). It provides a Paper Collection navigation function (browse papers by topic) and a Contributing guide, and open collaboration ensures the resource library continuously tracks advances in the field.

## Conclusion and Outlook

Large model efficiency optimization is a vibrant field with new technologies emerging constantly. Awesome-Efficient-Large-Models provides a knowledge infrastructure with systematicness and timeliness. As model scales expand and application scenarios broaden, efficiency optimization becomes increasingly important. The continuous update of the resource library will help practitioners stay at the forefront and promote the development of large models towards more efficient and inclusive directions.