Zing Forum

Reading

Therma: Replacing Softmax with Thermodynamic Relaxation to Explore a New Physical Paradigm for Large Model Inference

Therma is a high-performance simulation framework based on JAX. It replaces the traditional Softmax sampling head with a Discrete Thermodynamic Machine (DTM), reinterprets model weights as an energy landscape, and performs inference using stochastic relaxation and thermal noise, laying the groundwork for next-generation analog hardware AI.

热力学推理Softmax替代模拟硬件AIJAX框架Gibbs采样能量景观热噪声大模型推理优化
Published 2026-04-08 18:42Recent activity 2026-04-08 18:49Estimated read 6 min
Therma: Replacing Softmax with Thermodynamic Relaxation to Explore a New Physical Paradigm for Large Model Inference
1

Section 01

[Introduction] Therma: Revolutionizing Large Model Inference with Thermodynamic Relaxation—A New Physical Paradigm

Therma is a high-performance simulation framework based on JAX. Its core innovation lies in replacing the traditional Softmax sampling head with a Discrete Thermodynamic Machine (DTM), treating model weights as an energy landscape, and performing inference via stochastic relaxation and thermal noise, thus laying the foundation for the development of next-generation analog hardware AI.

2

Section 02

Background: Limitations of Traditional Softmax Inference and the Proposal of Thermodynamic Approach

Large language model inference has long relied on precise mathematical computations. As a core component, the Softmax layer performs deterministic normalization operations but faces challenges such as high energy consumption and strong hardware dependency. The Therma project introduces thermodynamic principles, replacing traditional precise computations with the relaxation process of physical systems, opening up a new path for analog hardware AI.

3

Section 03

Core Methodology: Paradigm Shift from Precise Computation to Thermodynamic Relaxation

Therma projects the Transformer's hidden states onto a potential energy manifold, treats weights as energy coefficients, eliminates the need for expensive global normalization, and adopts local Gibbs sampling. Its philosophical foundation is to allow the model to 'relax' into the answer like a physical system—thermodynamic systems tend to the equilibrium state with the lowest energy, and Therma finds the optimal token generation path through a stochastic process driven by thermal noise.

4

Section 04

Technical Architecture: Parallel Design of Dual Thermodynamic Sampling Units (TSU)

Therma designs a dual TSU system:

  • Unit A (Sampling Unit): Monitors the current token state and provides basic information for decision-making;
  • Unit B (Relaxation Unit): Uses a delay window to compute the thermal equilibrium of the next token in parallel. This pipeline architecture hides the MCMC mixing time, converts serial computation into parallel physical processes, and avoids strict timing dependencies.
5

Section 05

Key Features: Physical Simulation Control and Model Behavior Adjustment Capabilities

Therma provides fine-grained control capabilities:

  • Hardware Constraint Simulation: Simulates DAC precision limits and thermal noise floor to evaluate the performance of real analog hardware;
  • Beta (β) Control: Dynamic inverse temperature scheduling to adjust system randomness—high β (low temperature) tends to deterministic outputs, while low β (high temperature) enhances creativity. This physical intuition provides a new dimension for model control.
6

Section 06

Implementation and Application: Deployment Under JAX Framework and Visualization Demonstration

Therma is built on JAX, leveraging automatic differentiation and GPU acceleration. Its code structure includes the core TSU/DTM engine, visualization components, and proof-of-concept notebooks. Steps for use: Load a pre-trained model (e.g., Qwen2.5-0.5B) → Replace the Softmax head via weight surgery → Generate outputs through relaxation. The project also provides an interactive interface (index.html) that uses SVG+D3 to visualize the dynamic changes of the energy manifold.

7

Section 07

Significance and Outlook: Insights of Thermodynamic Inference for Next-Generation AI Hardware

The value of Therma lies in promoting the adaptation of AI computing paradigms to analog hardware—thermodynamic methods are naturally suitable for analog circuits, where noise can be utilized rather than eliminated. It inspires a rethinking of the essence of AI: Does intelligence need to be based on precise computation? Developed by independent researchers, this project provides a proof of concept for the future vision of 'AI relaxing into answers' and is worth the attention of scholars studying new computing paradigms and AI chip engineers.