Reading

TracePredict: Trajectory Motion Prediction Using Large Language Models

Leveraging the sequence modeling capabilities of large language models for trajectory prediction, converting spatiotemporal trajectory data into sequence tasks understandable by language models

大语言模型轨迹预测自动驾驶序列建模多模态学习机器人导航

Published 2026-06-11 16:43Recent activity 2026-06-11 16:58Estimated read 8 min

TracePredict: Trajectory Motion Prediction Using Large Language Models

Section 01

Introduction: TracePredict—Exploration of Trajectory Motion Prediction Using Large Language Models

The TracePredict project was released by li589 on GitHub on June 11, 2026. Its core is to use the sequence modeling capabilities of large language models (LLMs) for trajectory prediction, converting spatiotemporal trajectory data into sequence tasks understandable by LLMs. This project explores the application value of LLMs in trajectory prediction, covering technical implementation, application scenarios, advantage-disadvantage analysis, etc., and provides a reference direction for cross-domain applications of cross-modal models.

Section 02

Background: Challenges in Trajectory Prediction and Motivations for LLM Application

Trajectory prediction is a core challenge for intelligent systems such as autonomous driving and robot navigation, directly affecting safety and efficiency. Traditional methods rely on RNN, LSTM, or Transformer to model spatiotemporal sequences. The reasons TracePredict chooses LLMs include:

Generalization of sequence modeling: The seq2seq capability of LLMs can be transferred to trajectory sequences;
Pre-trained knowledge: LLMs implicitly learn physical common sense, social norms, and intent inference;
Multimodal fusion: Naturally supports text input, facilitating the integration of scene descriptions and other information.

Section 03

Technical Implementation: Trajectory Linguification and Model Design

Trajectory Linguification Representation

Grid discretization: The map is divided into grid cells, with positions corresponding to token IDs;
Relative displacement encoding: Represents relative movement (e.g., "0.5 meters forward");
Hybrid representation: Combines absolute position, relative displacement, and velocity information.

Model Architecture

Pure decoder (GPT style): Autoregressively generates future trajectories;
Encoder-decoder (T5 style): Processes historical trajectories and generates future trajectories;
Instruction fine-tuning: Formats the task into natural language instructions.

Training Strategy

Pre-training: Next-token prediction on large-scale trajectory datasets;
Fine-tuning: Supervised fine-tuning for specific scenarios (pedestrians/vehicles);
Reinforcement learning: Optimizes long-term prediction quality based on prediction accuracy.

Section 04

Application Scenarios: Autonomous Driving, Robotics, and Sports Analysis

Autonomous Driving

Pedestrian trajectory prediction: Input positions from the past 2 seconds, output path for the next 4 seconds;
Vehicle interaction prediction: Joint trajectory prediction in multi-vehicle scenarios.

Robot Navigation

Dynamic obstacle avoidance: Predict obstacle trajectories to plan safe paths;
Human-robot collaboration: Predict human intent to coordinate robot actions.

Sports Analysis

Player movement prediction: Assists in tactical analysis;
Game simulation: Generates virtual confrontations based on historical data.

Section 05

Advantages and Limitations: The Two Sides of LLM-based Trajectory Prediction

Potential Advantages

Advantage	Description
Few-shot adaptation	Prompt engineering enables quick adaptation to new scenarios
Interpretability	The model can explain the reasons for predictions
Knowledge transfer	Pre-trained knowledge improves generalization
Multi-task unification	A single model handles multiple tasks

Current Limitations

Computational cost: High inference latency, requiring GPU acceleration;
Accuracy loss: Discretization leads to reduced position accuracy;
Data hunger: Requires large amounts of trajectory-text paired data;
Physical constraints: May generate infeasible trajectories, requiring post-processing.

Section 06

Related Research: Frontiers in Trajectory Prediction

Trajectory Transformer: Google treats trajectories as discrete sequences and performs well on the nuScenes dataset;
Multimodal prediction: Combining visual/lidar information, LLMs enhance scene understanding;
Diffusion models: Modeling multimodal future distributions, combining with LLMs is an emerging direction.

Section 07

Project Value: Insights into Cross-modal Application Trends

TracePredict represents the trend of breaking modal barriers. Insights include:

Expansion of large model capabilities: The seq2seq architecture can be applied to non-text fields;
Key of representation learning: The importance of converting domain data into formats understandable by models;
Transfer of pre-trained knowledge: LLM common sense can play a role in non-text fields.

This project provides a baseline for researchers and shows developers the path of non-text expansion of LLMs.

Section 08

Future Outlook: Breakthroughs Driven by Multimodal Large Models

With the development of multimodal models such as GPT-4V and Gemini, trajectory prediction may achieve:

Visual-trajectory joint understanding: Predict movement directly by watching videos;
Conversational prediction: Obtain predictions through natural language scene descriptions;
Causal reasoning: Explain "why" while predicting.

TracePredict is an early exploration of this evolutionary path and is worth paying attention to.