# Team Red LLM: A Practical Guide and Benchmark Database for Local LLM Inference on AMD GPUs

> A community-maintained guide for local LLM deployment on AMD GPUs, covering detailed steps for ROCm/HIP inference, common pitfalls, real performance benchmark data, and supporting consumer Radeon, data center Instinct, and Strix Halo APU.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-02T00:07:30.000Z
- 最近活动: 2026-05-02T01:46:26.184Z
- 热度: 147.3
- 关键词: AMD GPU, ROCm, LLM inference, Radeon, local AI, benchmark, open source
- 页面链接: https://www.zingnex.cn/en/forum/thread/team-red-llm-amd-gpu
- Canonical: https://www.zingnex.cn/forum/thread/team-red-llm-amd-gpu
- Markdown 来源: floors_fallback

---

## Team Red LLM: AMD GPU Local LLM Inference Guide & Benchmark Database

Team Red LLM is a community-maintained project focusing on AMD GPU local LLM deployment. It provides detailed ROCm/HIP inference steps, common pitfalls, real performance benchmarks, and supports consumer Radeon, data center Instinct, and Strix Halo APU. The project aims to help AMD users avoid ROCm-related pitfalls and improve their local AI experience.

## Project Background: Addressing CUDA Centralization for AMD Users

The current LLM open-source ecosystem is CUDA-centric—most tutorials assume NVIDIA GPUs, toolchains often fallback to CPU on ROCm detection, and solutions are scattered in old Reddit posts. AMD users face a 'second-class citizen' experience. Team Red LLM was created as a community cookbook and benchmark database to solve these issues, allowing developers to share ROCm pitfalls and help others.

## Core Content: Modular Structure for Easy Access

The project uses a modular structure: 
- **COOKBOOK.md**: Step-by-step deployment guide with verified steps and marked pitfalls.
- **benchmarks/results.csv**: Community-contributed token generation speed data (not vendor marketing).
- **hardware/**: Per-GPU docs (BIOS, drivers,散热, best models).
- **models/**: Model-specific configs (launch params, problematic quant formats, MoE offload tips).
- **scripts/**: Wrapper scripts for llama-server, model switching, and benchmarking.

## Benchmark Evidence: RX7900 GRE Performance

The project provides RX7900 GRE 16GB performance data:
| GPU | 架构 | 模型 | 量化 | 模式 | 生成 tok/s | 提示 tok/s |
|-----|------|------|------|------|-----------|-----------|
| RX7900 GRE | gfx1100 | Moonlight-16B-A3B-Instruct | Q6_K | Full GPU |100.2|188.1|
| RX7900 GRE | gfx1100 | gemma-4-26B-A4B-it | UD-Q4_K_M | MoE offload (-ncmoe6)|31.0|61.3|
| RX7900 GRE | gfx1100 | Qwen3.6-35B-A3B-UD | Q4_K_S | MoE offload (-ncmoe32)|22.7|41.7|
| RX7900 GRE | gfx1100 | gemma-4-26B-A4B-it | UD-Q6_K | MoE offload (-ncmoe16)|17.3|80.7|

**Modes**: Full GPU (max speed, VRAM-limited) vs MoE offload (runs larger models but 3-5x slower due to DDR5 bandwidth).

## AMD GPU & ROCm Support Matrix

The project maintains a support matrix:
| Code | 架构家族 | 示例型号 | ROCm 支持状态 |
|------|----------|----------|---------------|
|gfx1100|RDNA3|RX7900XTX/XT/GRE|✅成熟|
|gfx1101|RDNA3|RX7800XT/7700XT|✅可用|
|gfx1102|RDNA3|RX7600|⚠️部分支持|
|gfx1200/1201|RDNA4|RX9070XT/9060|✅近期支持|
|gfx1150/1151|Strix Halo|Ryzen AI Max+395|⚠️前沿支持|
|gfx942/950|CDNA3|MI300X/MI325X|✅数据中心|

Strix Halo APU needs manual patches or specific kernel versions for ROCm support.

## Community Contribution Ways

Contribute via:
- **Benchmarks**: Submit PR to results.csv or use GitHub Issue template.
- **Pitfalls**: Add to COOKBOOK.md.
- **Models**: Create model-specific docs in models/.
- **Hardware**: Add GPU docs in hardware/.

GitHub Discussions for general topics (hardware suggestions, 'worth it' questions); Issues for bugs.

## Significance for Local AI Ecosystem

Team Red LLM proves AMD hardware can handle local LLM inference via community collaboration. It lowers entry barriers for budget users (e.g., used RX6000/7000 series) and reduces AMD users' reliance on cloud APIs. The project is MIT-licensed, encouraging free use and modification.
