Zing Forum

Reading

Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Tensora is an open-source framework that automatically selects the optimal I/O strategy to load LLM checkpoints using workload-aware heuristic algorithms. It supports multiple storage formats and backends, significantly improving model loading efficiency.

LLMcheckpoint loadingI/O optimizationRustSafeTensorsio_uring异步加载模型部署
Published 2026-06-08 00:12Recent activity 2026-06-08 00:20Estimated read 5 min
Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models
1

Section 01

Introduction / Main Post: Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Tensora is an open-source framework that automatically selects the optimal I/O strategy to load LLM checkpoints using workload-aware heuristic algorithms. It supports multiple storage formats and backends, significantly improving model loading efficiency.

2

Section 02

Original Author and Source

  • Original Author/Maintainer: Botir Khaltaev (botirk38)
  • Source Platform: GitHub
  • Original Title: tensora
  • Original Link: https://github.com/botirk38/tensora
  • Release Time: June 2026

3

Section 03

Background and Challenges

During the deployment and inference of Large Language Models (LLMs), checkpoint loading often becomes a performance bottleneck. As model sizes continue to grow, checkpoint files can reach tens or even hundreds of gigabytes, and traditional synchronous loading methods lead to significant startup delays. The optimal I/O strategy varies greatly across different scenarios: synchronous reading may be the fastest for small models with single shards, while large models with multiple shards require advanced techniques like asynchronous I/O or memory mapping.

Developers usually need to manually choose between multiple I/O backends, including synchronous POSIX, Tokio asynchronous, Linux io_uring, and memory mapping, but each solution has its applicable scenarios and limitations. This complexity increases deployment difficulty and easily leads to suboptimal choices.


4

Section 04

Tensora Project Overview

Tensora is an open-source framework specifically designed to solve the performance optimization problem of LLM checkpoint loading. It uses workload-aware heuristic algorithms to automatically select the fastest I/O strategy based on checkpoint size, shard structure, and platform capabilities.

The framework supports two mainstream storage formats:

  • SafeTensors: A secure tensor format widely used in the Hugging Face ecosystem
  • ServerlessLLM: A storage layout optimized for serverless deployment

5

Section 05

Core Architecture and Multi-Backend Support

Tensora's architecture design demonstrates high pluggability and supports four main I/O backends:

6

Section 06

1. Synchronous POSIX Backend

It uses a thread-parallel block reading strategy, suitable for small to medium-sized single-shard checkpoints. Through multi-threaded concurrent reading, it fully utilizes the multi-core capabilities of modern CPUs.

7

Section 07

2. Tokio Asynchronous Backend

Based on Rust's Tokio runtime, it provides high-performance asynchronous I/O capabilities. It is particularly suitable for task scenarios that require file-grouped processing, such as range read operations in ServerlessLLM.

8

Section 08

3. Linux io_uring Backend

It leverages the latest asynchronous I/O interface of the Linux kernel, supporting multi-worker-thread ring submission and batch merging. This is the performance leader for large model multi-shard scenarios (≥4GB).