Reading

Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Tensora is an open-source Rust framework that automatically selects the optimal I/O strategy via workload-aware heuristic algorithms, supports SafeTensors and ServerlessLLM storage formats, and significantly improves the loading efficiency of large models.

RustLLMcheckpoint loadingio_uringSafeTensorsServerlessLLMI/O optimizationmachine learning infrastructure

Published 2026-06-08 00:12Recent activity 2026-06-08 00:23Estimated read 5 min

Section 01

Introduction / Main Floor: Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Section 02

Original Author and Source

Original Author/Maintainer: Botir Khaltaev
Source Platform: GitHub
Original Title: tensora: Adaptive checkpoint loading for large language models
Original Link: https://github.com/botirk38/tensora
Release Time: 2026
License: Apache 2.0

Section 03

Project Background and Core Issues

Modern large language model deployment faces a common challenge: how to load huge model weights into GPU memory in the shortest time possible. Different storage formats (such as Hugging Face's SafeTensors and ServerlessLLM formats) have different access patterns, and different hardware platforms and file system configurations significantly impact I/O performance.

Traditional approaches usually choose a fixed loading strategy, such as always using memory mapping (mmap) or always using asynchronous I/O. However, this method cannot adapt to diverse workloads. Tensora's core insight is: No single I/O strategy is optimal in all scenarios, and the choice should be dynamically determined based on checkpoint size, shard structure, and platform capabilities.

Section 04

Technical Architecture and Core Mechanisms

Tensora adopts a modular architecture, separating I/O backends, storage formats, and conversion pipelines, making the system both flexible and extensible.

Section 05

Multi-Backend Support

The framework implements four main I/O backends:

Synchronous POSIX Backend: Uses thread-parallel block reading, suitable for small single-shard SafeTensors files
Tokio Asynchronous Backend: Based on Rust's async runtime, suitable for range-access-intensive ServerlessLLM format
Linux io_uring Backend: Leverages the Linux kernel's new asynchronous I/O interface, suitable for large multi-shard files (≥4GB)
Memory Mapping Backend: Directly maps files to memory space via mmap

Section 06

Adaptive Heuristic Algorithm

Tensora's core innovation lies in its adaptive default backend. The system automatically selects the optimal strategy based on the following factors:

Scenario	Recommended Backend	Mechanism
Small/single-shard SafeTensors	sync	Thread-parallel block POSIX reading
Large multi-shard SafeTensors (≥4GB)	io_uring	Multi-worker ring submission
Range-access ServerlessLLM	async	Tokio per-file task grouping
Large partitioned ServerlessLLM	io_uring	Batch submission and merging

This adaptive mechanism ensures near-optimal loading performance across different workloads without manual tuning by users.

Section 07

Supported Storage Formats

Tensora natively supports two mainstream model storage formats:

Section 08

SafeTensors

The SafeTensors format launched by Hugging Face is widely popular for its security and zero-copy features. Tensora is optimized for the flat storage structure of SafeTensors and can efficiently handle single-shard and multi-shard variants.

Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Introduction / Main Floor: Tensora: An Adaptive Checkpoint Loading Framework for Large Language Models

Original Author and Source

Project Background and Core Issues

Technical Architecture and Core Mechanisms

Multi-Backend Support

Adaptive Heuristic Algorithm

Supported Storage Formats

SafeTensors

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization