# IREE Optimization Experiments: Dynamic Shape Inference Optimization for LLMs like DeepSeek, Qwen, and Gemma

> The iree-optimization project, open-sourced by the PLC Lab at Chongqing University, focuses on conducting dynamic shape optimization experiments for large language models (LLMs) such as DeepSeek, Qwen, and Gemma using the IREE compiler, exploring technical paths for efficiently running LLMs on edge devices.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-15T06:46:46.000Z
- 最近活动: 2026-06-15T06:57:56.736Z
- 热度: 152.8
- 关键词: IREE, LLM推理优化, 动态形状, DeepSeek, Qwen, Gemma, 编译器优化, MLIR, 边缘AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/iree-deepseekqwengemmallm
- Canonical: https://www.zingnex.cn/forum/thread/iree-deepseekqwengemmallm
- Markdown 来源: floors_fallback

---

## [Introduction] IREE Optimization Experiments: Dynamic Shape Inference Optimization for LLMs like DeepSeek, Qwen, and Gemma

The PLC Lab at Chongqing University has open-sourced the iree-optimization project, which focuses on conducting dynamic shape optimization experiments for mainstream large language models (LLMs) such as DeepSeek, Qwen, and Gemma using the IREE compiler, exploring technical paths for efficiently running LLMs on edge devices. Based on the IREE framework, the project addresses the static compilation challenges caused by dynamic shapes in LLM inference, providing references for compiler optimization in LLM deployment.

## Background: Compiler Optimization Challenges in LLM Inference

LLM inference efficiency is a key bottleneck in AI application deployment. Optimization of traditional deep learning frameworks (PyTorch/TensorFlow) is limited by runtime overhead. IREE is an MLIR-based compiler framework open-sourced by Google, supporting multiple hardware backends (CPU, GPU, etc.) and advanced compilation optimization technologies. However, LLM inference faces dynamic shape challenges: variable sequence lengths, autoregressive generation, batch size changes, and KV cache growth, which are difficult to handle effectively with static compilation.

## Project Content: Dynamic Shape Optimization for Mainstream LLMs

The iree-optimization project is maintained by PLC-CQU and includes test scripts for the following models:
- DeepSeek: Exploring compilation to IREE format and dynamic shape handling;
- Qwen: Researching Chinese processing and dynamic sequence compilation optimization;
- Gemma: Exploring efficient dynamic inference of lightweight models in IREE.

## Technical Exploration Directions: Compilation Flow and Optimization Strategies

The project's technical exploration directions include:
1. Compilation flow: Model import → Shape analysis → Compilation configuration → Code generation → Runtime integration;
2. Dynamic shape handling strategies: Full dynamic, partial dynamic, multi-version compilation, dynamic batching;
3. Performance optimization: Memory planning, operator fusion, quantization support, parallel strategies.

## Significance for LLM Deployment

The value of this project lies in:
- Verifying the feasibility of deploying LLMs with the IREE compiler and providing reference implementations;
- Accumulating best practices for dynamic shape handling;
- Supporting cross-platform deployment (from server GPUs to mobile chips);
- Providing an academic research platform for fields such as compiler optimization.

## Usage and Participation Suggestions

Developers can use the project in the following ways:
1. Clone the repository to get test scripts and configurations;
2. Install the IREE compiler and related tools;
3. Run experiments to observe compilation and runtime results;
4. Adjust parameters to adapt to target hardware;
5. Submit Issues or PRs to contribute improvements.

## Conclusion: Exploration and Outlook of LLM Compilation Optimization

The iree-optimization project represents the academic community's active exploration of LLM deployment optimization, applying advanced compiler technologies to mainstream LLMs and providing new possibilities for efficient and flexible inference. For developers deepening into LLM compilation optimization, it is an open-source project worth paying attention to and participating in.
