Zing Forum

Reading

TOSA: ARM's Deep Learning Tensor Operation Standard Architecture

TOSA is an open-source tensor operation set architecture specification led by ARM, providing standardized definitions of full tensor operations for deep learning networks, supporting cross-hardware platform model portability and optimized compilation.

TOSA张量运算深度学习ARM硬件标准化MLIR神经网络编译器AI加速器开源规范
Published 2026-05-21 00:15Recent activity 2026-05-21 00:20Estimated read 7 min
TOSA: ARM's Deep Learning Tensor Operation Standard Architecture
1

Section 01

Introduction: ARM Launches TOSA Deep Learning Tensor Operation Standard Architecture

TOSA is an open-source tensor operation set architecture specification led by ARM, aiming to solve the problem of deep learning hardware fragmentation. It provides standardized definitions of full tensor operations for deep learning networks, supporting cross-hardware platform model portability and optimized compilation. As a standardized intermediate representation layer between deep learning frameworks and underlying hardware, it realizes the vision of "write once, run anywhere".

2

Section 02

Urgent Need for Deep Learning Hardware Standardization

With the rapid development of artificial intelligence technology, deep learning models are widely used in fields such as image recognition and natural language processing, but the problem of hardware fragmentation is becoming increasingly prominent: different platforms (data center GPU clusters, edge AI accelerators, etc.) use different instruction sets and operation primitives, leading to poor portability of model deployment, requiring developers to repeatedly optimize models with low efficiency. Against this background, ARM launched the TOSA specification to address this challenge.

3

Section 03

Core Positioning and Design Principles of TOSA

TOSA stands for Tensor Operator Set Architecture, an open hardware-agnostic specification that defines a set of common full tensor operations for deep learning networks. It does not replace existing frameworks (such as TensorFlow and PyTorch) but serves as an intermediate representation layer between frameworks and hardware. Its key design principles include: hardware abstraction (defining operation semantics rather than specific implementations), full tensor operations (focusing on core workloads like convolution and matrix multiplication), static shape friendliness (facilitating compiler optimization), and verifiability (with reference implementations and test suites included).

4

Section 04

Content and Technical Features of the TOSA Specification

The TOSA specification is written in AsciiDoc, which details the input/output shapes, data types, numerical behavior, and boundary handling of each operator. The main operator categories include: convolution and matrix operations (2D/3D convolution, fully connected layers, etc.), activation functions (ReLU, Sigmoid, etc.), tensor operations (Reshape, Transpose, etc.), normalization and pooling (Average Pool, Layer Normalization, etc.), element-wise operations (Add, Mul, etc.), and quantization support (low-precision operations like INT8/INT16). In addition, the specification strictly defines numerical precision, including intermediate result precision, rounding modes, overflow handling, and quantization formulas, to ensure consistent results across different hardware.

5

Section 05

Toolchain and Ecosystem Value of TOSA

TOSA provides a complete toolchain, relying on tools like Asciidoctor, Make, and Python to generate HTML/PDF documents; it uses pre-commit hooks to ensure code quality. In terms of ecosystem value: framework developers can convert models to TOSA intermediate representation, reducing the cost of supporting new hardware; hardware vendors only need to implement the TOSA interface to be compatible with multiple frameworks; end users can deploy models seamlessly. TOSA is deeply integrated with MLIR as a first-class MLIR dialect, supporting operator transformation optimization and interoperability with other dialects.

6

Section 06

Practical Application Scenarios of TOSA

TOSA has been applied in multiple scenarios: edge AI chips (e.g., ARM Ethos series accelerators use TOSA as a high-level interface); compiler toolchains (TensorFlow Lite TOSA converter, IREE, etc., support TOSA as an intermediate representation); model optimization (TOSA-based compilers can perform convolution-activation fusion, memory layout optimization, etc., to improve performance).

7

Section 07

Summary and Outlook of TOSA

The launch of TOSA marks an important step in deep learning hardware standardization, effectively alleviating the problem of ecosystem fragmentation and building a standardized bridge between frameworks, compilers, and hardware, especially widely adopted in the edge AI field. In the future, TOSA will continue to evolve to support the needs of large models like Transformers and may provide references for new paradigms such as quantum computing and neuromorphic computing. Understanding TOSA is crucial for comprehending the software stack of modern AI systems.