Typologies of Image-Based 3D Vision

SfM

Dense Matching

Identify many more corresponding points
If you have 2 images that overlap, you want the disparity for each pixel (Pixel Wise disparity estimation)
- From the pixel-wise disparity, we get the depth map.
Parallax = Disparity
Epipolar geometry is known
Goal: estimate the shape of object surfaces in detail

You first need to do SfM and then Dense Matching.

Difficulties in finding corresponding points:

Images have different perspectives ⇒ The wider the baseline, the harder it gets.
Repetitive patterns (brick wall) — how do you find the correspondent?
Surfaces without texture
Visibility (rain, fog, smoke)

Setups

One Camera - Forward Looking:

Poor intersection of rays, hence
- Poor positioning in direction of the corridor
- No depth estimation for the end of the corridor
  - The motion of the camera is in the same direction as the depth perception, so the baseline which is perpendicular to the depth direction is missing.
Nice solution: Panoramic Camera

Two Cameras:

Reasonable measurements in nearby part of corridor
Poor accuracy at end of corridor because of bad depth-to-base ratio. Recall $σ_{Z} = \frac{Z ^{2}}{B _{f}} σ_{D}$

Three Cameras:

I would be able to get 3 pairs (3 different baselines)
I can project the points from the first two images into the third image.
Adds a lot of robustness — makes the data way more reliable.
- Repetitive patterns
- Random dot patterns

Camera + IMU:

VIO / VISLAM
- You can rely on the IMU sometimes (for a short period)
- Sensor alignment

LiDAR:

Have to understand which parts of the scene are stable (fixed) and which are in motion
For 3D pose estimation, point clouds need to contain three surfaces with independent normal vectors
- Otherwise we risk sliding: Because the data hasn’t changed, the software “slides” along that axis, unable to tell if the robot moved 10 centimeters or 10 meters.

LiDAR + IMU:

Camera + LiDAR:

They have complementary properties:
- Camera has poor depth perception with a small baseline
  - LiDAR has high ranging accuracy
- Camera depends on surface texture
  - LiDAR doesn’t require surface texture
- Camera has high spatial resolution
  - LiDAR has large point spacing on distant surfaces
- Camera has little scene structure needed
  - LiDAR depends on scene structure!
Don’t forget about sensor alignment
- Rotation between axes should be calibrated
- Offset between sensor origins should be calibrated
- Strong stiff connection between lidar and camera body

Camera + LiDAR + IMU: