Section 01
Introduction to the Tessera Framework: A Production-Oriented Knowledge Distillation Solution for Large Models
Tessera is an open-source large language model knowledge distillation framework. Its core goal is to compress large models into efficient small models via custom GPU kernels, sharded training, and high-performance inference technologies, making it suitable for deployment in resource-constrained production environments. The project name metaphorically refers to reorganizing knowledge fragments from large models into a compact and complete form. It uses a tech stack consisting of Python libraries, a Rust inference engine, and a JAX reference implementation.