Reading

LSE-MTP: Multi-Token Prediction with Latent Semantic Enhancement for Building Consistent World Models

The study proposes the LSE-MTP method, which addresses the structural hallucination problem in standard multi-token prediction by anchoring predictions to real hidden state trajectories, effectively bridging the gap between discrete tokens and continuous state representations.

世界模型多令牌预测潜在语义增强结构性幻觉表示学习梯度归纳偏置LLM

Published 2026-04-08 01:54Recent activity 2026-04-08 11:20Estimated read 6 min

LSE-MTP: Multi-Token Prediction with Latent Semantic Enhancement for Building Consistent World Models

Section 01

[Introduction] LSE-MTP: Addressing MTP Structural Hallucination to Build Consistent World Models

The consistency of internal world models in Large Language Models (LLMs) is a core debate in the AI field. Traditional Multi-Token Prediction (MTP) can learn structured representations, but it has a structural hallucination problem (discrete token supervision leads to shortcuts in the latent space, violating environmental constraints). This study proposes the Latent Semantic Enhancement Multi-Token Prediction (LSE-MTP) method, which bridges the gap between discrete tokens and continuous state representations by anchoring to real hidden state trajectories, effectively resolving structural hallucinations and improving the consistency and robustness of world models.

Section 02

Background: The Debate on LLM World Models and Evolution of Prediction Paradigms

The Debate on LLM World Models

The academic community has divisions on whether LLMs have true world models: one side argues they are statistical pattern matchers that only learn word correlations; the other side believes they form internal models that can reason about world states. The core of the debate lies in whether internal representations capture the world's structure or merely memorize surface patterns.

From NTP to MTP: Evolution of Prediction Paradigms

Traditional Next-Token Prediction (NTP) focuses on single-step accuracy and struggles to capture long-range structures; Multi-Token Prediction (MTP) predicts multiple future tokens simultaneously, encouraging the learning of structured representations, inducing representation contractivity via gradient coupling, and promoting the convergence of internal beliefs.

Section 03

Advantages of MTP and Concerns About Structural Hallucinations

The gradient inductive bias of MTP brings representation contractivity, mapping similar inputs to similar latent representations, which is beneficial for structured learning. However, standard MTP has structural hallucinations: discrete token supervision encourages shortcuts in the latent space, violating real-world constraints (such as physical laws), leading to vulnerability under out-of-distribution data.

Section 04

The LSE-MTP Method: A Solution Anchored to Real States

The core of LSE-MTP is to anchor predictions to real hidden state trajectories, using dual supervision: it not only predicts future tokens but also predicts corresponding real-world states (such as physical position and velocity). This mechanism prevents latent representations that violate constraints, bridges the gap between discrete tokens and continuous states, and provides additional training signals to enhance robustness.

Section 05

Experimental Validation: Effectiveness on Synthetic and Real Tasks

The study validates LSE-MTP on two types of tasks:

Synthetic Graph Traversal: Reduces structural hallucinations, and latent representations better reflect the real topology of the graph;
Manhattan Taxi Trajectory Prediction: Improves prediction accuracy, and its robustness to noise perturbations is significantly better than standard MTP.

Section 06

Core Benefits: Representation Alignment and Robustness Improvement

LSE-MTP achieves representation alignment, where latent representations are more consistent with the semantic structure of the real world, enhancing interpretability and generalization ability; it also improves robustness, with more stable performance when facing out-of-distribution data or perturbations, solving the vulnerability problem of standard MTP.

Section 07

Future Implications and Conclusion

Future Research Directions

Extend to complex modalities such as vision and audio;
Explore efficient acquisition of supervision signals (simulation environments, human feedback);
Combine with reinforcement learning and imitation learning;
Quantify the quality of world models.

Conclusion

LSE-MTP is an important step toward building trustworthy world models. It emphasizes that supervision signals need to balance task performance and real structure, providing new ideas for training AI that truly understands the world.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15