Zing Forum

Reading

LLMPhy: Combining Large Language Models with Physics Engines for Parameter-Identifiable Physical Reasoning

The LLMPhy framework, open-sourced by Mitsubishi Electric Research Laboratories, combines GPT with the MuJoCo physics engine via black-box optimization, enabling large models to estimate implicit physical parameters such as object mass and friction coefficient, and construct digital twins of real-world scenes.

物理推理大语言模型MuJoCo参数识别数字孪生机器人三菱电机零样本学习
Published 2026-04-29 03:29Recent activity 2026-04-29 03:51Estimated read 9 min
LLMPhy: Combining Large Language Models with Physics Engines for Parameter-Identifiable Physical Reasoning
1

Section 01

Introduction: LLMPhy Framework—A Parameter Identification Physical Reasoning Solution Combining Large Language Models and Physics Engines

The LLMPhy framework, open-sourced by Mitsubishi Electric Research Laboratories, combines GPT with the MuJoCo physics engine via black-box optimization, enabling large models to estimate implicit physical parameters such as object mass and friction coefficient, and construct digital twins of real-world scenes. The framework adopts a two-stage decomposition strategy and an iterative feedback loop, supports zero-shot learning, and is accompanied by the LLMPhy-TraySim benchmark dataset, providing a new technical path for scenarios like robotic manipulation and autonomous driving.

2

Section 02

Background: The Challenge of Implicit Parameter Identification in Physical Reasoning

Background: The Challenge of Implicit Parameters in Physical Reasoning

In real-world applications such as robotic manipulation and autonomous driving collision avoidance, AI systems not only need to understand "how objects move" but also accurately estimate implicit physical parameters like "how heavy an object is" and "what the surface friction coefficient is". However, most learning-based physical reasoning methods ignore this key challenge—parameter identification.

Without accurate parameter estimation, even the most advanced vision models cannot reconstruct digital twins of real-world scenes in physics engines. This limits the application capabilities of AI systems in real-world physical interactions.

3

Section 03

Core Methods and Optimization Mechanisms of LLMPhy

Core Architecture of LLMPhy

LLMPhy is a black-box optimization framework proposed by Mitsubishi Electric Research Laboratories (MERL) that bridges the physical knowledge embedded in large language models (LLMs) and the world model implemented by the MuJoCo physics engine.

The framework adopts a two-stage decomposition strategy:

Stage 1: Continuous Physical Parameter Estimation The system extracts object motion trajectories from multi-view video sequences, uses GPT to generate Python programs to estimate continuous parameters such as mass and friction coefficient, executes them in MuJoCo, and calculates the reconstruction error.

Stage 2: Discrete Scene Layout Estimation After obtaining physical parameters, it estimates discrete layout parameters like the spatial position and orientation of objects in the scene to complete the full scene reconstruction.

Iterative Optimization Mechanism

The core innovation of LLMPhy lies in the iterative feedback loop: after each parameter estimation, the reconstruction error is fed back to the LLM to prompt it to improve the estimated values. This "generate-execute-feedback-optimize" loop allows the model to gradually converge to accurate parameters.

The entire process is fully zero-shot, requiring no fine-tuning for specific objects or scenes, and relies only on pre-trained physical common sense and visual input to complete reasoning.

4

Section 04

Evidence: LLMPhy-TraySim Benchmark Dataset

LLMPhy-TraySim Benchmark Dataset

Since existing physical reasoning benchmarks rarely consider parameter identifiability, the research team built the LLMPhy-TraySim dataset. This dataset is used to evaluate physical reasoning capabilities under zero-shot settings, including various object configurations, push rod interaction scenes, and corresponding ground-truth physical parameters.

The dataset supports two-stage evaluation: testing the model's ability to estimate physical parameters and reconstruct scene layouts respectively.

5

Section 05

Technical Implementation Details

Technical Implementation Details

The project is implemented based on the MuJoCo 2.1.0 physics engine and mujoco_py bindings. The code provides a complete Python API interface, including:

  • An interaction layer between LLM and MuJoCo
  • Complete prompt templates for two-stage optimization
  • Automatic evaluation scripts for generated solutions and ground truth
  • Dataset generation tools (capable of creating new simulation samples)

For Apple Silicon Mac users, the project documentation provides a detailed Rosetta-compatible environment configuration guide to solve the compilation problem of mujoco_py on ARM architecture.

6

Section 06

Application Prospects and Significance

Application Prospects and Significance

LLMPhy demonstrates a new paradigm combining symbolic physical knowledge and neural reasoning capabilities, which is particularly suitable for:

  • Robotic manipulation planning: Estimating object weight and friction characteristics to optimize grasping strategies
  • Autonomous driving scene understanding: Predicting object motion trajectories after collisions
  • Physical simulation and digital twins: Automatically constructing interactive virtual scenes from visual observations
  • Scientific experiment analysis: Inferring physical system parameters from video data

This framework proves that LLMs can not only answer physical questions but also actively participate in the process of physical parameter identification and optimization, providing a new technical path for the development of Embodied AI.

7

Section 07

Usage and Extension Suggestions

Usage and Extension

Developers can adapt to different physical reasoning tasks by modifying prompt templates, or replace the underlying physics engine (e.g., migrating from MuJoCo to Isaac Gym). The project's modular design decouples the core iterative optimization logic from the specific physical simulation implementation, making it easy to reuse in different application scenarios.