Zing Forum

Reading

Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Tensora is an open-source Rust framework that automatically selects the optimal I/O strategy via workload-aware heuristic algorithms, supports SafeTensors and ServerlessLLM storage formats, and significantly improves the loading efficiency of large models.

RustLLMcheckpoint loadingio_uringSafeTensorsServerlessLLMI/O optimizationmachine learning infrastructure
Published 2026-06-08 00:12Recent activity 2026-06-08 00:23Estimated read 5 min
Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models
1

Section 01

Introduction / Main Floor: Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Tensora is an open-source Rust framework that automatically selects the optimal I/O strategy via workload-aware heuristic algorithms, supports SafeTensors and ServerlessLLM storage formats, and significantly improves the loading efficiency of large models.

2

Section 02

Original Author and Source

  • Original Author/Maintainer: Botir Khaltaev
  • Source Platform: GitHub
  • Original Title: tensora: Adaptive checkpoint loading for large language models
  • Original Link: https://github.com/botirk38/tensora
  • Release Time: 2026
  • License: Apache 2.0
3

Section 03

Project Background and Core Issues

Modern large language model deployment faces a common challenge: how to load huge model weights into GPU memory in the shortest time possible. Different storage formats (such as Hugging Face's SafeTensors and ServerlessLLM formats) have different access patterns, and different hardware platforms and file system configurations significantly impact I/O performance.

Traditional approaches usually choose a fixed loading strategy, such as always using memory mapping (mmap) or always using asynchronous I/O. However, this method cannot adapt to diverse workloads. Tensora's core insight is: No single I/O strategy is optimal in all scenarios, and the choice should be dynamically determined based on checkpoint size, shard structure, and platform capabilities.

4

Section 04

Technical Architecture and Core Mechanisms

Tensora adopts a modular architecture, separating I/O backends, storage formats, and conversion pipelines, making the system both flexible and extensible.

5

Section 05

Multi-Backend Support

The framework implements four main I/O backends:

  1. Synchronous POSIX Backend: Uses thread-parallel block reading, suitable for small single-shard SafeTensors files
  2. Tokio Asynchronous Backend: Based on Rust's async runtime, suitable for range-access-intensive ServerlessLLM format
  3. Linux io_uring Backend: Leverages the Linux kernel's new asynchronous I/O interface, suitable for large multi-shard files (≥4GB)
  4. Memory Mapping Backend: Directly maps files to memory space via mmap
6

Section 06

Adaptive Heuristic Algorithm

Tensora's core innovation lies in its adaptive default backend. The system automatically selects the optimal strategy based on the following factors:

Scenario Recommended Backend Mechanism
Small/single-shard SafeTensors sync Thread-parallel block POSIX reading
Large multi-shard SafeTensors (≥4GB) io_uring Multi-worker ring submission
Range-access ServerlessLLM async Tokio per-file task grouping
Large partitioned ServerlessLLM io_uring Batch submission and merging

This adaptive mechanism ensures near-optimal loading performance across different workloads without manual tuning by users.

7

Section 07

Supported Storage Formats

Tensora natively supports two mainstream model storage formats:

8

Section 08

SafeTensors

The SafeTensors format launched by Hugging Face is widely popular for its security and zero-copy features. Tensora is optimized for the flat storage structure of SafeTensors and can efficiently handle single-shard and multi-shard variants.