Sources: UTwente slides, Stochastic trajectory prediction via motion indeterminacy diffusion, Vectornet: Encoding hd maps and agent dynamics from vectorized representation, LAformer: Trajectory Prediction for Autonomous Driving with Lane-Aware Scene Constraints

Trajectory Conditional Prediction:

Y \leftarrow p_{θ} (Y_{H + 1 : F} ∣ X_{0 : H}, constraints)

where the constraints include social interactions with other agents, maps for e.g.
$[0, H]$ is the observation time horizon and $[H + 1, F]$ is the prediction time horizon.

Conditional Prediction: you need to make decisions in either seconds (e.g. autonomous driving), or minutes (e.g. marine application)

We model the trajectory prediction as spatial-temporal mapping and then we can divide it into:

Scene constraints
Multi-path prediction
Interaction modeling

The limitations of generative models on this task are:

Difficult to train,
Limited variety, e.g. modal collapse problem.

Diffusion Models for Trajectory Prediction

Why Diffusion?

Multi-modality: naturally generate diverse possible futures (not just one)

Stable training: unlike GANs, no mode collapse or adversarial instability

Uncertainty modeling: probabilistic sampling fits real-world robotics needs

Flexible conditioning: can incorporate maps, goals, dynamics, safety constraints

Strong empirical results: state-of-the-art in trajectory forecasting & robot planning

So basically, they are state-of-the-art. And in this case, many applications have already been developed on top of Diffusion. There are interesting improvements like Mean Flow which appeared in May 2025. Anyways..

So we return to the Conditional Prediction:

Encoding of the condition (C): scene constraints and interactions

Y \leftarrow p_{θ} (Y_{H + 1 : F} ∣ X_{0 : H}, C)

This can be done through:

Rasterized maps

e.g. bird-eye-views, semantic maps

Vectorized maps

e.g. HD maps

PROS and CONS of these visualizations

Interaction Modeling:

Global interactions using Graph Convolutional Networks (GCN).

A fully-connected graph models agent-to-agent, agent-to-scene, and scene-to-scene interactions
Self-supervised learning is used to predict the masked nodes

Agent-to-scene interactions using using likelihood estimation

Use a binary classifier to estimate the likelihood of each lane aligned with the target agent’s motion dynamics at each time step
Only select the top-k lane candidates

Interactions modeling with attentions (a.k.a. leveraging transformers).

Each agent computes a query vector (as in its own. the queries are individual)
Other agents provide the keys and values.
Then simply compute the attention weights.
We can expect the result to be a weighted sum of interactions.
My intuition tells me you would need lots of data for this.

Multi Path-Prediction:

Reminding diffusion: a forward diffusion process that gradually corrupts an input sample $x_{0}$ by adding Gaussian noise over $T$ timesteps.

q (x_{t} ∣ x_{0}) = N (x_{t}; μ = \overset{α}{ˉ}_{t} x_{0}, σ = (1 - \overset{α}{ˉ}_{t}) I)

where $\overset{α}{ˉ}_{t} = \sum_{s = 1}^{t} α_{s}$ is the cumulative product of the noise schedule parameters with $α_{s} = 1 - β_{s}$

The denoising step trains a neural network to reverse the noise and recover data.

μ_{θ} (x_{t}, t) = \frac{1}{α _{t}} (x_{t} - \frac{1 - α _{t}}{1 - α ˉ _{t}} ϵ_{θ} (x_{t}, t))

p_{θ} (x_{t - 1} ∣ x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t))

So, how to apply diffusion models for trajectory prediction?

We defined the conditional prediction as $Y \leftarrow p_{θ} (Y_{H + 1 : F} ∣ X_{0 : H}, C)$ .

Now we are going to denoise $Y$ :

$f$ is the encoding of the condition $(X_{0 : H}, constraints)$

$k \in [1, K]$ , where $K$ is the maximum number of diffusion steps.

code implementation: github lik. There were also lots of teams who submitted their approaches to the Argoverse 2: Motion Prediction Challenge. I will leave the link here in case of future need: link to challenge.

One colleague asked in class why do we always have to make it stochastic? A concept such as trajectory prediction can be very simply be made deterministic using concepts such as cubic polynomials, splines, or Bezier Curves. For example, during my bachelors thesis, I was collaborating with the Bosch Future Mobility Challenge group, and they approximated the future short-distance trajectory using Bezier Curves; which I found extremely interesting. But I guess stochasticity allows you to slap the universal function approximator (aka Neural Networks).

Some metrics include Average Displacement Error (ADE), Final Displacement Error (FDE)

ADE = \frac{1}{NT} i = 1 \sum N t = 1 \sum T ∥ \overset{x}{^}_{i, t} - x_{i, t} ∥

FDE = \frac{1}{N} i = 1 \sum N ∥ \overset{x}{^}_{i, T} - x_{i, T} ∥

where

$N$ is the number of agents,
t are the timesteps,
$\overset{x}{^}$ is the predicted step while $x$ is the ground-truth position.

On top of these two, some other metrics can be defined.

Miss Rate (MR)
- The number of scenarios where none of the forecasted trajectories are within 2.0 meters of ground truth according to the endpoint error
- $FDE = \frac{1}{N} \sum_{i = 1}^{N} ∥ \overset{x}{^}_{i, T} - x_{i, T} ∥ > 2.0$
- This metric gives a hint, in general, how many scenarios are failed
Collision Rate (CR)
- Percentage of generated trajectories that collide with other agents or obstacles, distance < 0.1m
Multimodal Predictions: as models output $K$ samples
- minADE_K $∣$ minFDE_K
  - best-of-K error (take the predictions closest to the ground-truth)
- Miss Rate (MR@K)
  - Fraction of cases where none of the $K$ predicted trajectories fall within a set threshold (e.g., 2m) of the ground truth final point
  - Useful to measure coverage of plausible futures
- Brier-minFDE
  - $Brier-minFDE = (1 - p)^{2} \cdot minFDE$ ,
  - $p$ is the probability of the best predicted trajectory out of the K samples.
Negative Log-Likelihood

🚀 Costin Chitic

Recent Notes

ROS2 - Writing Publishers and Subscribers

ROS2 Commands Basics

ROS2 Starting Basics

Error Analysis of Airborne Laser Scanning Data

Point Cloud Segmentation Practical

Motion Prediction

Diffusion Models for Trajectory Prediction

Graph View

Backlinks