Zing Forum

Reading

Running Large Language Models on WASM: Cross-Platform LLM Inference with WACS Runtime

The WACS project, based on a pure .NET WASM runtime, supports running real LLMs and small ML models in WebAssembly sandboxes, enabling cross-platform, pluggable-backend AI inference.

WebAssemblyWASMLLMWASI-NNWACS跨平台推理GGUFONNXRust
Published 2026-05-11 16:10Recent activity 2026-05-11 16:25Estimated read 7 min
Running Large Language Models on WASM: Cross-Platform LLM Inference with WACS Runtime
1

Section 01

Introduction: WACS Runtime – Enabling Cross-Platform LLM Inference in WASM Sandboxes

WACS is a WebAssembly runtime implemented purely in .NET, based on the WASI-NN standard. It supports running Large Language Models (LLMs) and small machine learning models in WASM sandboxes, addressing core issues of cross-platform deployment, security isolation, and high-performance inference. Through a pluggable backend architecture, the project fully decouples model inference logic from the underlying platform, achieving the goal of "compile once, run anywhere".

2

Section 02

Background: Two Key Challenges in AI Deployment

Against the backdrop of widespread AI application adoption, developers face two key challenges: 1. The diversity of deployment environments (different operating systems, hardware architectures) makes it difficult for LLMs to run seamlessly; 2. How to achieve high-performance inference while ensuring security isolation. The WACS project was developed precisely to address these challenges.

3

Section 03

WACS Definition and Core Design Philosophy

WACS (WebAssembly Component System) is a WASM runtime implemented purely in .NET, specifically designed for WASI-NN (WebAssembly System Interface for Neural Networks). Its core design philosophy includes:

  1. Sandboxed Isolation: WASM client components written in Rust can be compiled once and run cross-platform;
  2. Pluggable Backend Architecture: Dynamically bind inference backends via the --bind parameter without modifying client code;
  3. Zero Native Dependencies: Pure .NET tooling with no FFI glue code or complex native libraries, making deployment simple.
4

Section 04

Supported Backends and Models

WACS currently supports 5 mainstream inference backends, details as follows:

Backend Model Format Typical Model Hardware Acceleration
LlamaSharp GGUF Qwen2.5 0.5B Instruct (~352MB) Metal/CUDA
OnnxRuntime ONNX Gemma3 270M (~1.14GB) General Purpose
OnnxRuntimeGenAI GenAI Gemma3 270M Instruct (~864MB) KV Cache Optimization
TorchSharp TorchScript XOR MLP (~6KB) libtorch
ML.NET Classical ML Custom Models General Purpose
5

Section 05

Technical Architecture Analysis

WACS's technical architecture is divided into three layers:

  1. WASM Client Layer: Compiled using the wasm32-wasip2 target (WASI Preview2 component model), implementing four core functions: set_input, compute, get_output, and load_by_name;
  2. Backend NuGet Packages: Each backend corresponds to an independent NuGet package (e.g., WACS.WASI.NN.LlamaSharp binds llama.cpp + Metal/CUDA);
  3. Automatic Hardware Detection: Automatically adapts to Apple Silicon (Metal), Linux/Windows GPUs (CUDA), and optimal CPU instruction sets (AVX/NEON).
6

Section 06

Quick Start Guide

Environment preparation requirements: .NET SDK 8 or 9, Rust + Cargo, approximately 3GB of disk space. One-click setup steps:

  1. Clone the repository: git clone https://github.com/kelnishi/LLM-on-WASM.git;
  2. Run the script: ./scripts/setup.sh (installs WACS.Cli global tool + configures backend NuGet packages). Run examples:
  • GGUF model: ./scripts/run-llm.sh -v;
  • ONNX model: ./scripts/run-slm.sh -v;
  • Switch backend: Specify different backend DLLs via wacs run --bind.
7

Section 07

Significance and Future Outlook of WACS

Significance:

  • Portability: Model inference logic is fully decoupled from the underlying platform;
  • Security Isolation: WASM sandbox provides memory safety and capability security guarantees;
  • Flexible Deployment: Supports edge devices, cloud services, desktop applications, and other scenarios;
  • Development Experience: Write client logic in Rust without worrying about ML framework details. Future Outlook: Broader model format support, efficient quantization schemes, distributed inference orchestration, browser-native LLM inference, etc.
8

Section 08

Summary

The WACS runtime combines the portability and security of WebAssembly with the standardization advantages of WASI-NN, providing a new approach to AI deployment. It achieves the "compile once, run anywhere" for LLMs and is an innovative project worth attention for AI developers pursuing cross-platform, secure isolation, and flexible deployment.