正文

Team Red LLM：AMD GPU 本地大模型推理的实战指南与基准数据库

社区维护的 AMD GPU 本地 LLM 部署指南，涵盖 ROCm/HIP 推理的详细步骤、常见陷阱、真实性能基准数据，支持消费级 Radeon、数据中心 Instinct 和 Strix Halo APU。

AMD GPUROCmLLM inferenceRadeonlocal AIbenchmarkopen source

发布时间 2026/05/02 08:07最近活动 2026/05/02 09:46预计阅读 5 分钟

章节 01

Team Red LLM: AMD GPU Local LLM Inference Guide & Benchmark Database

Team Red LLM is a community-maintained project focusing on AMD GPU local LLM deployment. It provides detailed ROCm/HIP inference steps, common pitfalls, real performance benchmarks, and supports consumer Radeon, data center Instinct, and Strix Halo APU. The project aims to help AMD users avoid ROCm-related pitfalls and improve their local AI experience.

章节 02

Project Background: Addressing CUDA Centralization for AMD Users

The current LLM open-source ecosystem is CUDA-centric—most tutorials assume NVIDIA GPUs, toolchains often fallback to CPU on ROCm detection, and solutions are scattered in old Reddit posts. AMD users face a 'second-class citizen' experience. Team Red LLM was created as a community cookbook and benchmark database to solve these issues, allowing developers to share ROCm pitfalls and help others.

章节 03

Core Content: Modular Structure for Easy Access

The project uses a modular structure:

COOKBOOK.md: Step-by-step deployment guide with verified steps and marked pitfalls.
benchmarks/results.csv: Community-contributed token generation speed data (not vendor marketing).
hardware/: Per-GPU docs (BIOS, drivers,散热, best models).
models/: Model-specific configs (launch params, problematic quant formats, MoE offload tips).
scripts/: Wrapper scripts for llama-server, model switching, and benchmarking.

章节 04

Benchmark Evidence: RX7900 GRE Performance

The project provides RX7900 GRE 16GB performance data:

GPU	架构	模型	量化	模式	生成 tok/s	提示 tok/s
RX7900 GRE	gfx1100	Moonlight-16B-A3B-Instruct	Q6_K	Full GPU	100.2	188.1
RX7900 GRE	gfx1100	gemma-4-26B-A4B-it	UD-Q4_K_M	MoE offload (-ncmoe6)	31.0	61.3
RX7900 GRE	gfx1100	Qwen3.6-35B-A3B-UD	Q4_K_S	MoE offload (-ncmoe32)	22.7	41.7
RX7900 GRE	gfx1100	gemma-4-26B-A4B-it	UD-Q6_K	MoE offload (-ncmoe16)	17.3	80.7

Modes: Full GPU (max speed, VRAM-limited) vs MoE offload (runs larger models but 3-5x slower due to DDR5 bandwidth).

章节 05

AMD GPU & ROCm Support Matrix

The project maintains a support matrix:

Code	架构家族	示例型号	ROCm 支持状态
gfx1100	RDNA3	RX7900XTX/XT/GRE	✅成熟
gfx1101	RDNA3	RX7800XT/7700XT	✅可用
gfx1102	RDNA3	RX7600	⚠️部分支持
gfx1200/1201	RDNA4	RX9070XT/9060	✅近期支持
gfx1150/1151	Strix Halo	Ryzen AI Max+395	⚠️前沿支持
gfx942/950	CDNA3	MI300X/MI325X	✅数据中心

Strix Halo APU needs manual patches or specific kernel versions for ROCm support.

章节 06

Community Contribution Ways

Contribute via:

Benchmarks: Submit PR to results.csv or use GitHub Issue template.
Pitfalls: Add to COOKBOOK.md.
Models: Create model-specific docs in models/.
Hardware: Add GPU docs in hardware/.

GitHub Discussions for general topics (hardware suggestions, 'worth it' questions); Issues for bugs.

章节 07

Significance for Local AI Ecosystem

Team Red LLM proves AMD hardware can handle local LLM inference via community collaboration. It lowers entry barriers for budget users (e.g., used RX6000/7000 series) and reduces AMD users' reliance on cloud APIs. The project is MIT-licensed, encouraging free use and modification.