Specifically, Lidar-Inertial SLAM.

The basic idea is you build a map and then localize the robot on that map. I discussed in Lidar-Inertial Perception the types of map representations and also the scan matching methods.

GraphSLAM

“Given all sensor measurements and motion constraints collected so far… What is the most probable set of robot poses and map variables?”

In graph representation, all robot states are discretized into nodes
Nodes are robot poses (circles) or observed features (stars)
Link indicate

transformations (𝑹, 𝒕) between consecutive poses (i.e. spatial constraints)
observations of features, i.e., perception measurements

This means that this factor graph is a topological map.

Since every edge corresponds to a spatial constraints between two nodes, we need to optimize the graph:

Minimize the error introduced by the two constraints (alter nodes and change links)

STATE SPACE

We consider the position and orientation as the states: $x_{t} \in S$

If we consider a 2D graph, then one robot state is equivalent to $x_{t} = x y θ$

How to discretize the trajectory so that the problem is computationally sound?

We use the key frames concept from Lidar-Inertial Perception. We discretize the trajectory w.r.t. time $Δ t$ , or by saying that there can be only one state per traveled distance (e.g. $d_{0} = 0.1 m$ )

Motion Constraint

Since we are talking about Lidar-Inertial SLAM, the IMU tells us how the robot moved or the control input $u_{t} = (v_{t} ω_{t})$ .

x_{t} y_{t} θ_{t} = x_{t - 1} y_{t - 1} θ_{t - 1} + \frac{- v _{t}}{ω _{t}} sin θ_{t - 1} + \frac{v _{t}}{ω _{t}} sin (θ_{t - 1} + ω_{t} Δ t) \frac{v _{t}}{ω _{t}} cos θ_{t - 1} - \frac{v _{t}}{ω _{t}} cos (θ_{t - 1} + ω_{t} Δ t) ω_{t} Δ t

We denote $g = g (u_{t}, x_{t}) = x_{t - 1} y_{t - 1} θ_{t - 1} + \frac{- v _{t}}{ω _{t}} sin θ_{t - 1} + \frac{v _{t}}{ω _{t}} sin (θ_{t - 1} + ω_{t} Δ t) \frac{v _{t}}{ω _{t}} cos θ_{t - 1} - \frac{v _{t}}{ω _{t}} cos (θ_{t - 1} + ω_{t} Δ t) ω_{t} Δ t$ as the motion constraint.

To complete the update from one state to the next, we also need to take noise into consideration:

x_{t} y_{t} θ_{t} = x_{t - 1} y_{t - 1} θ_{t - 1} + \frac{- v _{t}}{ω _{t}} sin θ_{t - 1} + \frac{v _{t}}{ω _{t}} sin (θ_{t - 1} + ω_{t} Δ t) \frac{v _{t}}{ω _{t}} cos θ_{t - 1} - \frac{v _{t}}{ω _{t}} cos (θ_{t - 1} + ω_{t} Δ t) ω_{t} Δ t + N (0, σ_{x}^{2}) N (0, σ_{y}^{2}) N (0, σ_{θ}^{2})

where we define $R^{- 1} = σ_{x}^{2} 00 0 σ_{y}^{2} 0 00 σ_{θ}^{2}$ as the Process noise covariance matrix.

Measurement Constraint

This is mainly about finding the landmarks and making use of the information from them. For this, we define a measurement model h which relies on landmarks $m_{i}$ with signatures $s_{i}$ and observer position $x_{t}$ .

Therefore, we can define the measurement $z_{t}$ against the previously known landmark position of $m_{i}$ .

Since for this we use the Lidar, the measurement vector contains the range $r$ and the viewing angle $ϕ$ (against robot orientation $θ$ ) with signature $s$ :

z_{t} = r_{t} ϕ_{t} s_{t} \approx (m_{j, x} - x)^{2} + (m_{j, y} - y)^{2} a t an 2 (m_{j, y} - y, m_{j, x} - x) - θ s_{j} + n o i se

where we define $h = (m_{j, x} - x)^{2} + (m_{j, y} - y)^{2} a t an 2 (m_{j, y} - y, m_{j, x} - x) - θ s_{j}$ and $Q^{- 1} = σ_{r}^{2} 00 0 σ_{ϕ}^{2} 0 00 σ_{s}^{2}$ as the Measurement noise covariance matrix.

The basic SLAM problem

These two constraints together describe a basic SLAM problem: given with the noisy control input $u$ and the sensor reading $z$ data, how to estimate $x$ (localization) and mapping problem?

So the 4 important variables:

$x_{t}$ the pose of the robot in body frame
$g_{t}$ the motion constraint using the IMU data in body frame
$h_{t}$ the measurement constraint in body frame
$z_{t}$ the measurement model which is the pose of the landmark in body frame (LiDAR)

Graph Construction

After constructing the graph, we have a cost function J to minimize. The graph contains all the measurements between time $t_{0}$ and $t_{T}$

To construct the cost function, we first define the information matrix $Ω$ where graph links are represented in a matrix.

Therefore, we define:

J_{g r a p h S L A M} = x_{0}^{T} Ω_{0} x_{0} + t = 0 \sum T [x_{t} - g (u_{t}, x_{t - 1})]^{T} R^{- 1} [x_{t} - g (u_{t}, x_{t - 1})] + t = 0 \sum T [z_{t} - h (m_{c_{t}}, x_{t})]^{T} Q^{- 1} [z_{t} - h (m_{c_{t}}, x_{t})]

Took from Section 11.4.3 of the Probabilistic Robotics book by S. Thrun:

We define $y_{0 : t}$ to be a vector composed of the robot poses $x_{0 : t}$ and the landmark positions $m = (m_{1}, m_{2}, \dots, m_{N})^{T}$ , whereas $y_{t}$ is composed of the momentary pose at time $t$ and the respective landmark:

$y_{0 : t} = x_{0} x_{1} . . x_{t} m$
$y_{t} = (x_{t} m)$

Linearizing the Motion Model:

The various terms in the loss function above are quadratic in the functions $g$ and $h$ , not in the variables we seek to estimate (poses and the map). Thus, we have to linearize g and h via Taylor expansion around the current estimate $μ_{t}$ :

g (u_{t}, x_{t - 1}) \approx g (u_{t}, μ_{t - 1}) + G_{t} (x_{t - 1} - μ_{t - 1})

Here $μ_{t}$ is the current estimate of the state vector $y_{t}$ .
$G_{t} = \frac{\partial g ( u _{t} , x _{t - 1} )}{\partial x _{t - 1}}$ is the Jacobian of g at $x_{t} = μ_{t - 1}$

We define the motion residual:

r_{t}^{(u)} = x_{t} - g (u_{t}, μ_{t - 1})

Then:

x_{t} - g (u_{t}, x_{t - 1}) \approx r_{t}^{(u)} - G_{t} (x_{t - 1} - μ_{t - 1})

Linearizing the Measurement Model:

h_{t} (y_{t}, c_{t}) \approx h (\overset{y}{ˉ}_{t}, c_{t}) + H_{t} (y_{t} - \overset{y}{ˉ}_{t})

$\overset{y}{ˉ}_{t}$ is the current estimate of state $y_{t}$
$H_{t}$ is the Jacobian of $h$

We define the measurement residual:

r_{t}^{(z)} = z_{t} - h (\overset{y}{ˉ}_{t}, c_{t})

In class, we expand

H_{t} = [\frac{\partial h}{\partial y _{t}} \frac{\partial h}{\partial c _{t}^{i}}] = [H_{t}^{y} H_{t}^{c_{i}}]

The Jacobian of $r_{t}^{z}$ w.r.t. the full state vector $X$ is:

J_{t}^{(z)} = [0 \dots - H_{t}^{y} \dots H_{t}^{c_{i}} \dots 0]

it’s a row vector (a sparse matrix row) that:
- is zero everywhere except at the positions corresponding to:
  - the current pose $x_{t}$ (where we have $- H_{t}^{y}$ )
  - the observed landmark $m_{c t}$ (where we have $- H_{t}^{c_{i}}$ )

Example: If we are at pose $x_{2}$ observing landmark $m_{1}$ :

$J_{2}^{(z)} = (00 - H_{2}^{y} 0 \dots - H_{2}^{m_{1}} 0 \dots)$

The Full Linearization of the Cost Function:

After linearization, we substitute back into the cost function. For the measurement term:

∥ z_{t} - h (y_{t}, c_{t}) ∥_{Q_{t}^{- 1}}^{2} \approx ∥ r_{t}^{(z)} - H_{t} (y_{t} - \overset{y}{ˉ}_{t}) ∥_{Q_{t}^{- 1}}^{2}

Let $δ y_{t} = y_{t} - \overset{y}{ˉ}_{t}$ be the correction we want to find. Then:

∥ r_{t}^{(z)} - H_{t} δ y_{t} ∥_{Q_{t}^{- 1}}^{2}

By expanding this quadratic, we will get the contribution to $Ω$ and $ζ$

Information matrix $Ω$ (from quadratic terms $J^{T} Q^{- 1} J$ ). It tells us which states are connected
Information vector $ζ$ (from linear terms $J^{T} Q^{- 1} r$ ). It tells us how much correction is needed in each direction.

Similarly, for the motion model, we have:

x_{t} - g (u_{t}, x_{t - 1}) \approx r_{t}^{(u)} - G_{t} (x_{t - 1} - μ_{t - 1})

Define the Jacobian for motion in the global state space: $J_{t}^{(u)} = [0 \dots - G_{t} \dots I \dots 0]$ Where:

$- G_{t}$ appears at position $i$ (for $x_{t - 1}$ )
$I$ (identity) appears at position $j$ (for $x_{t}$ )

The contribution to $Ω$ :

Ω \leftarrow Ω + (J_{t}^{(u)})^{T} R_{t}^{- 1} J_{t}^{(u)}

The contribution to $ζ$ :

ζ \leftarrow ζ + (J_{t}^{(u)})^{T} R_{t}^{- 1} r_{t}^{(u)}

Once we have $Ω$ and $ζ$ , the solution is:

Ω δ X = ζ

where $δ X$ is the correction to apply to the current state estimate:

X^{n e w} = X^{o l d} + δ X

Some insights:

we can recover the covariances after solving $Σ = Ω^{- 1}$
we iterate because linearization is only accurate near the linearization point

Now we can answer these questions:

Why isn’t integrating all odometry enough?

Because odometry has cumulative errors (drift) that grow unbounded over time

Consider a factor graph, why is it useful to represent the robot trajectory this way?

Sparsity (the factor graph represents only the constraints, not all correlations),

modularity (we can add constraints incrementally)

How do nodes help with scalability when the environment gets large?

We can discretize using the key frame concept.

Helps with traceability.

Loop Closure

Covered in Loop Closure.

Impact on the map:

Why can a single loop-closure constraint dramatically reshape the entire map?

because Graph-SLAM solves one global least-squares problem over all poses. By directly coupling two distant poses in the trajectory, the new constraint changes the optimum for the entire problem. Consequently, many poses are adjusted simultaneously to satisfy all constraints, not just local ones

Applications for Graph-SLAM:

For example, in a forest. The tree foliage causes GNSS errors and the acquired trajectory and the point cloud is noisy. Therefore, formulating the trajectory as a graph means the poses are linked from the measured relative transformations between them. Also, the trees would appear circular and so easy to detect and insert in the Graph. Also, the method is successful only when D>>d. In this scenario, the trees are far enough apart that even with trajectory noise, the system can clearly separate observations from different trees.

Where the Method Fails:

The system breaks down in dense forests. When the distance between trees (D) becomes close to or smaller than the statistical error (d), the observations overlap.

High Trajectory Noise: Larger drift makes the estimated position of a tree very uncertain.
Small Feature Distance: When trees are packed together, the robot cannot distinguish if a measurement belongs to “Tree A” or “Tree B”.

Another idea: 3D Point Clouds

Each point is a landmark
use ICP for scan matching
Solve for Rotation & Translation and Correct the Trajectory.

How does each sensor become just another constraint?

Sensor fusion is natural, diverse sensor data is encoded in constraints. New measurements affect only local parts of the graph, so it enables incremental and partial updates.

Data Association Uncertainty:

Often the largest and most dangerous source of error. The robot doesn’t know:

which landmark it is observing
which scan feature corresponds to which past feature
whether scans overlap, or whether a loop closure is correct

Graph SLAM is in post-processing phase, so OFFLINE! It also has the largest sliding window possible: all the states ( $N$ ).

Bayesian GraphSLAM

slides 63-70

Notes from the professor

Why is it so hard for professors to make good materials? WHHYHWQQWRQ$!@#!EWQE!@# ok

Upgrading from Pair-wise ICP to Scan-to-Map (covered in Pen and Paper Exercises SLAM):

Instead of aligning two individual scans $(q_{i} \approx d_{i})$ , which causes drift, it’s better to scan to a local map M
That translates into minimizing the cost function $E_{m} (T) = \sum_{(p_{i}, m_{i})} ∣∣ m_{i} - T p_{i} ∣ ∣^{2}$ where $m_{i} \in map M$
This “sliding window” of recent scans provides a more stable geometric reference.

Motion Compensation (“Unwrapping”):

The notes show how we can model the motion and measurement constraints. However, when we want to implement them; we have to take into account that points were taken at different points in time $τ \in [t_{s t a r t}, t_{e n d}]$ . If we want to align them, we need to “unwrap” them to a single reference time $t$ . This actively prevents the warping of geometry (if I scan a wall, I want it to be fking straight, no?).

So the raw points $p_{r a w} (τ)$ are corrected using:

\tilde{p}_{correc t e d} = H_{W \to B (t)} H_{W \to B (τ)}^{- 1} \tilde{p}_{r a w} (τ)

The first transformation $H_{W \to B (τ)}^{- 1}$ takes the coordinates to World Frame first, and then we rearrange all of them to the Body Frame $H_{W \to B (t)}$ back again but w.r.t to time $t$ .

Representing Errors in 3D (SE(3)):

We cannot subtract Rotation Matrices directly. It doesn’t really represent anything.

So we use the Log Map to convert matrix differences into a 6D vector. It’s actually a really smart way of computing errors or optimizations.
$r = L o g (H_{m e a s}^{- 1} H_{p re d}) \in R^{6}$ yields a vector where the first 3 components are the rotation error and the last 3 errors are the translation error.

you don’t believe me? I wouldn’t! Let’s see the mathematics

We consider $H = [R 0 t 1], R \in SO (3), t \in R^{3}$ .

The log map is $L o g (H) = [L o g (R) V^{- 1} t]$ , where $L o g (R)$ is the axis-angle vector $ϕ$ satisfying

R = e x p (ϕ), θ = ∣∣ ϕ ∣∣

Hence, the rotation vector expresses a rotation of magnitude $θ$ about the unit axis $u = ϕ / θ$ . The rotation logarithm is

L o g (R) = \frac{θ}{2 sin ( θ )} R_{32} - R_{23} R_{13} - R_{31} R_{21} - R_{12}, with θ = a rccos (\frac{t r a ce ( R ) - 1}{2})

I’m not gonn’ memorize all this crap. But it’s good to provide some context on how it’s actually done.

The matrix $V$ is the left Jacobian of SO(3):

V = I - \frac{1}{2} \hat{ϕ} + \frac{1}{ϕ ^{2}} (1 - \frac{θ co t ( θ /2 )}{2}) \hat{θ}^{2}, \hat{ϕ} = 0 ϕ_{3} - ϕ_{2} - ϕ_{3} 0 ϕ_{1} ϕ_{2} - ϕ_{1} 0

Very similar implementation in Rodrigues Rotation Formula where I implemented using the skew symmetric logic and SO(3) space.

Redefined Global Optimization

If we take sensor fusion into consideration, we need to take all residuals into account. Thus, the trajectory is solved by minimizing a sum of residuals from different sources (IMU, LiDAR, Loop Closures). It helps since it’s how sensor fusion actually happens—by weighting each sensor based on its uncertainty. How we get here is covered in Pen and Paper Exercises SLAM.

\hat{X} = a r g mi n_{X} (∣∣ r_{0} ∣ ∣_{\sum_{0}}^{2} + \sum ∣∣ r^{I M U} ∣ ∣_{\sum_{I M U}}^{2} + \sum ∣∣ r^{L I O} ∣ ∣_{\sum_{L I O}}^{2} + \sum ∣∣ r^{L C} ∣ ∣_{\sum_{L C}}^{2})

Each residual $r$ is weighted by its covariance $\sum$ , allowing the system to trust the IMU during fast motion and LiDAR when the geometry is clear. I read somewhere that the covariance is uncertainty, so its inverse is information.

$∣∣ r ∣ ∣_{\sum}^{2} = r^{T} \sum^{- 1} r$
So intuition tells us that if uncertainty is high, the information is low. This mathematically forces the optimizer to give that measurement less “vote” in the final trajectory.

🚀 Costin Chitic

Recent Notes

ROS2 - Writing Publishers and Subscribers

ROS2 Commands Basics

ROS2 Starting Basics

Error Analysis of Airborne Laser Scanning Data

Point Cloud Segmentation Practical

Simultaneous Localization and Mapping (SLAM)

GraphSLAM

STATE SPACE

Motion Constraint

Measurement Constraint

Graph Construction

Loop Closure

Bayesian GraphSLAM

Notes from the professor

Graph View

Table of Contents

Backlinks