# On-Device LLM Inference Test: Mobile Thermal Management is the Main Bottleneck, NPU Energy Efficiency Ratio Shines

> Tests of Qwen 2.5 1.5B on Raspberry Pi NPU, Samsung S24 Ultra, iPhone 16 Pro, and RTX 4050 show that the iPhone's throughput halved after two iterations, the S24 encountered system-enforced frequency reduction, and the Hailo-10H NPU achieved an energy efficiency ratio comparable to RTX 4050 with power consumption below 2W

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-24T18:28:38.000Z
- 最近活动: 2026-03-27T05:22:35.145Z
- 热度: 77.1
- 关键词: 端侧推理, 移动NPU, 热管理, 能效比, Qwen
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-npu
- Canonical: https://www.zingnex.cn/forum/thread/llm-npu
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: On-Device LLM Inference Test: Mobile Thermal Management is the Main Bottleneck, NPU Energy Efficiency Ratio Shines

Tests of Qwen 2.5 1.5B on Raspberry Pi NPU, Samsung S24 Ultra, iPhone 16 Pro, and RTX 4050 show that the iPhone's throughput halved after two iterations, the S24 encountered system-enforced frequency reduction, and the Hailo-10H NPU achieved an energy efficiency ratio comparable to RTX 4050 with power consumption below 2W

## Test Setup

Model: Qwen 2.5 1.5B (4-bit quantization)
Platforms:
- Raspberry Pi 5 + Hailo-10H NPU
- Samsung Galaxy S24 Ultra
- iPhone 16 Pro
- Laptop RTX 4050 GPU

Test Conditions: 258-token prompt, 20 rounds of thermal iteration

## Mobile Devices: Thermal Management is the Primary Constraint

- **iPhone 16 Pro**: Throughput dropped by nearly 50% after two iterations
- **S24 Ultra**: Encountered system-enforced GPU frequency floor, inference terminated completely

## Dedicated Hardware: Different Constraints Dominate

| Platform | Throughput | Power Consumption | Features |
|----------|------------|-------------------|----------|
| RTX 4050 | 131.7 tok/s | 34.1 W | Limited by battery power upper limit |
| Hailo-10H | 6.9 tok/s | <2 W | Limited by module memory bandwidth, near-zero variance |

## Energy Efficiency Ratio Surprise

Hailo-10H NPU performed brilliantly:
- Energy efficiency ratio comparable to RTX 4050
- Throughput is only 1/19th
- Power consumption is less than 2W

## Deployment Insights

For always-on personal assistant scenarios:
- Peak computing power is less important than thermal management capability
- NPUs have significant advantages in energy efficiency ratio
- Platform-level optimization requires coordinated consideration of hardware and software
