Zing 论坛

正文

Team Red LLM:AMD GPU 本地大模型推理的实战指南与基准数据库

社区维护的 AMD GPU 本地 LLM 部署指南,涵盖 ROCm/HIP 推理的详细步骤、常见陷阱、真实性能基准数据,支持消费级 Radeon、数据中心 Instinct 和 Strix Halo APU。

AMD GPUROCmLLM inferenceRadeonlocal AIbenchmarkopen source
发布时间 2026/05/02 08:07最近活动 2026/05/02 09:46预计阅读 5 分钟
Team Red LLM:AMD GPU 本地大模型推理的实战指南与基准数据库
1

章节 01

Team Red LLM: AMD GPU Local LLM Inference Guide & Benchmark Database

Team Red LLM is a community-maintained project focusing on AMD GPU local LLM deployment. It provides detailed ROCm/HIP inference steps, common pitfalls, real performance benchmarks, and supports consumer Radeon, data center Instinct, and Strix Halo APU. The project aims to help AMD users avoid ROCm-related pitfalls and improve their local AI experience.

2

章节 02

Project Background: Addressing CUDA Centralization for AMD Users

The current LLM open-source ecosystem is CUDA-centric—most tutorials assume NVIDIA GPUs, toolchains often fallback to CPU on ROCm detection, and solutions are scattered in old Reddit posts. AMD users face a 'second-class citizen' experience. Team Red LLM was created as a community cookbook and benchmark database to solve these issues, allowing developers to share ROCm pitfalls and help others.

3

章节 03

Core Content: Modular Structure for Easy Access

The project uses a modular structure:

  • COOKBOOK.md: Step-by-step deployment guide with verified steps and marked pitfalls.
  • benchmarks/results.csv: Community-contributed token generation speed data (not vendor marketing).
  • hardware/: Per-GPU docs (BIOS, drivers,散热, best models).
  • models/: Model-specific configs (launch params, problematic quant formats, MoE offload tips).
  • scripts/: Wrapper scripts for llama-server, model switching, and benchmarking.
4

章节 04

Benchmark Evidence: RX7900 GRE Performance

The project provides RX7900 GRE 16GB performance data:

GPU 架构 模型 量化 模式 生成 tok/s 提示 tok/s
RX7900 GRE gfx1100 Moonlight-16B-A3B-Instruct Q6_K Full GPU 100.2 188.1
RX7900 GRE gfx1100 gemma-4-26B-A4B-it UD-Q4_K_M MoE offload (-ncmoe6) 31.0 61.3
RX7900 GRE gfx1100 Qwen3.6-35B-A3B-UD Q4_K_S MoE offload (-ncmoe32) 22.7 41.7
RX7900 GRE gfx1100 gemma-4-26B-A4B-it UD-Q6_K MoE offload (-ncmoe16) 17.3 80.7

Modes: Full GPU (max speed, VRAM-limited) vs MoE offload (runs larger models but 3-5x slower due to DDR5 bandwidth).

5

章节 05

AMD GPU & ROCm Support Matrix

The project maintains a support matrix:

Code 架构家族 示例型号 ROCm 支持状态
gfx1100 RDNA3 RX7900XTX/XT/GRE ✅成熟
gfx1101 RDNA3 RX7800XT/7700XT ✅可用
gfx1102 RDNA3 RX7600 ⚠️部分支持
gfx1200/1201 RDNA4 RX9070XT/9060 ✅近期支持
gfx1150/1151 Strix Halo Ryzen AI Max+395 ⚠️前沿支持
gfx942/950 CDNA3 MI300X/MI325X ✅数据中心

Strix Halo APU needs manual patches or specific kernel versions for ROCm support.

6

章节 06

Community Contribution Ways

Contribute via:

  • Benchmarks: Submit PR to results.csv or use GitHub Issue template.
  • Pitfalls: Add to COOKBOOK.md.
  • Models: Create model-specific docs in models/.
  • Hardware: Add GPU docs in hardware/.

GitHub Discussions for general topics (hardware suggestions, 'worth it' questions); Issues for bugs.

7

章节 07

Significance for Local AI Ecosystem

Team Red LLM proves AMD hardware can handle local LLM inference via community collaboration. It lowers entry barriers for budget users (e.g., used RX6000/7000 series) and reduces AMD users' reliance on cloud APIs. The project is MIT-licensed, encouraging free use and modification.