Zing Forum

Reading

Running Large Language Models on Snapdragon X Elite: Practice of NPU-Accelerated On-Device AI Inference

This article introduces how to run large language model inference on Windows ARM64 devices equipped with Snapdragon X Elite/X2 Elite, using Qualcomm NPU and ONNX Runtime QNN Execution Provider to achieve efficient on-device AI computing.

Snapdragon X EliteNPU端侧AIONNX RuntimeQNNARM64大语言模型推理加速
Published 2026-04-21 02:39Recent activity 2026-04-21 02:55Estimated read 6 min
Running Large Language Models on Snapdragon X Elite: Practice of NPU-Accelerated On-Device AI Inference
1

Section 01

Introduction / Main Floor: Running Large Language Models on Snapdragon X Elite: Practice of NPU-Accelerated On-Device AI Inference

This article introduces how to run large language model inference on Windows ARM64 devices equipped with Snapdragon X Elite/X2 Elite, using Qualcomm NPU and ONNX Runtime QNN Execution Provider to achieve efficient on-device AI computing.

2

Section 02

The Rise of On-Device AI

With the continuous improvement of large language model capabilities, AI computing is migrating from the cloud to end devices. On-Device AI has significant advantages such as privacy protection, low latency, and offline availability, and the key to achieving all this lies in the support of dedicated AI acceleration hardware. The Qualcomm Snapdragon X Elite platform is an important driver of this trend.

3

Section 03

Hardware Architecture

Snapdragon X Elite is Qualcomm's flagship ARM processor for Windows PCs, with core highlights including:

Hexagon NPU

  • Computing Power: Up to 45 TOPS (trillions of operations per second) of AI computing power
  • Dedicated Design: A dedicated processor optimized for neural network inference
  • Energy Efficiency: Several times higher energy efficiency for AI tasks compared to traditional CPU/GPU

Oryon CPU

  • Performance Cores: 12 high-performance cores, deeply customized based on ARM architecture
  • Energy-Efficiency Balance: Intelligent scheduling achieves the best balance between performance and battery life
  • x86 Compatibility: Runs traditional Windows applications via an emulation layer

Adreno GPU

  • Graphics Performance: Supports high-quality graphics rendering
  • AI Collaboration: Can work with NPU to handle hybrid AI workloads
4

Section 04

Market Positioning

Snapdragon X Elite targets the high-end thin and light laptop market, focusing on:

  • Ultra-Long Battery Life: The energy efficiency advantages of ARM architecture bring all-day battery life
  • AI-Native: Provides hardware acceleration for AI applications at the chip level
  • Thin and Light Design: Low-power characteristics support fanless design
5

Section 05

Introduction to ONNX Runtime

ONNX Runtime is a cross-platform machine learning inference accelerator developed by Microsoft, supporting:

  • Multi-Framework Compatibility: Models from frameworks like PyTorch and TensorFlow can be converted to ONNX format
  • Hardware Acceleration: Supports multiple backends such as CPU, GPU, and NPU
  • Performance Optimization: Advanced optimization techniques like graph optimization and operator fusion
6

Section 06

Qualcomm QNN (Qualcomm Neural Network)

QNN is a neural network inference SDK provided by Qualcomm, with features including:

Hardware Abstraction Layer

  • Unified Interface: Provides a consistent API for different Qualcomm platforms
  • Backend Optimization: Deeply optimized for Hexagon NPU
  • Quantization Support: Low-precision quantization acceleration for INT8, INT4, etc.

Model Compilation

  • Offline Compilation: Precompiles models into device-specific formats
  • Runtime Optimization: Dynamic graph optimization and memory management
  • Caching Mechanism: Avoids repeated compilation overhead
7

Section 07

QNN Execution Provider

This is a dedicated execution provider for ONNX Runtime on Qualcomm platforms:

  • Seamless Integration: ONNX models can directly use the QNN backend
  • Performance Advantage: Fully leverages the computing power of Hexagon NPU
  • Development Convenience: Can switch backends without modifying model code
8

Section 08

Environment Preparation

Hardware Requirements

  • Snapdragon X Elite or X2 Elite device
  • Windows 11 ARM64 version
  • Sufficient system memory (16GB or more recommended)

Software Dependencies

Need to install the following components:

  1. Visual Studio 2022: For C++ development environment
  2. Python 3.11 ARM64: Native ARM64 Python interpreter
  3. ONNX Runtime QNN Package: Special version containing QNN Execution Provider
  4. Qualcomm AI Stack: QNN SDK and related tools