Typologies of Image-Based 3D Vision
- Visual Odometry
- Visual SLAM
- Key difference: Youβre building up a map (helps me understand where I am compared to the past) β For e.g. it is aware it was in one point before (loop closure)
- Structure from Motion (SfM)
Structure from Motion vs Dense Matching
SfM
- Sparse point cloud
- SIFT
- no constraints that can be used
Dense Matching
- Identify many more corresponding points
- If you have 2 images that overlap, you want the disparity for each pixel (Pixel Wise disparity estimation)
- Parallax = Disparity
- From the pixel-wise disparity, we get the depth map.
You first need to do SfM and then Dense Matching.
Difficulties in finding corresponding points:
- Images have different perspectives β The wider the baseline, the harder it gets.
- Repetitive patterns (brick wall) β how do you find the correspondent?
- Surfaces without texture
- Visibility (rain, fog, smoke)
Setups
One Camera - Forward Looking:
- Poor intersection of rays, hence
- Poor positioning in direction of the corridor
- No depth estimation for the end of the corridor
- The motion of the camera is in the same direction as the depth perception, so the baseline which is perpendicular to the depth direction is missing.
- Nice solution: Panoramic Camera
Two Cameras:
- Reasonable measurements in nearby part of corridor
- Poor accuracy at end of corridor because of bad depth-to-base ratio. Recall
Three Cameras:
- I would be able to get 3 pairs (3 different baselines)
- I can project the points from the first two images into the third image.
- Adds a lot of robustness β makes the data way more reliable.
- Repetitive patterns
- Random dot patterns
Camera + IMU:
- VIO / VISLAM
- You can rely on the IMU sometimes (for a short period)
- Sensor alignment
LiDAR:
- Have to understand which parts of the scene are stable (fixed) and which are in motion
LiDAR + IMU:
- IMU: how much youβve moved in the past seconds
- Can be used to eliminate mistakes
- Add stability
Camera + LiDAR:
- They have complementary properties:
-
- Camera has poor depth perception with a small baseline
- LiDAR has high ranging accuracy
- Camera depends on surface texture
- LiDAR doesnβt require surface texture
- Camera has high spatial resolution
- LiDAR has large point spacing on distant surfaces
- Camera has little scene structure needed
- LiDAR depends on scene structure!
- Camera has poor depth perception with a small baseline
- Donβt forget about sensor alignment
- Rotation between axes should be calibrated
- Offset between sensor origins should be calibrated
- Strong stiff connection between lidar and camera body
Camera + LiDAR + IMU:
- Further improvement of accuracy and robustness
- Sensor Alignment again!
- They are highly complementary. Whenever possible, combine them!
Bundle Adjustment
Collinearity Equation
Setup: We have a scene that we see from different perspective (same idea from SfM β letβs say the Colosseum).
I know the distortion, I know the principal point location, etc.
I want to estimate the location of all the points in 3D space.
In BA, in one go we estimate the 3D geometry of all the points in the coordinate system and their location at the same time.
Collinearity Equation: you complete this.
Letβs say we have 5 points that can be extracted in 5 different images.
- In total: 5x5x2 camera coordinates/tie point = 50 collinearity equations
- 5 images x 6 pose parameters = 30 pose parameters
- 5 tie points x 3 world coordinates = 15 world coordinates
- 45 unknown parameters
- So we have 50 equations with 45 unknown parameters. Redundant, but possible.
Bundle Adjustment:
- Obtain approximate values needed for unknown parameters
- Linearize collinearity equations
Least Squares Parameter Estimation
The equations are non-linear and least squares requires linear relation .
So we linearize using Taylor series.
So we solve
And in the end we get
We do Least Squares Estimation until we minimize
We need to define the origin of the world!!! (could be the first camera pose).
Error Propagation
Recall simple rules from Linear Algebra
Covariance matrix is symmetric
Everything is simplified when you have the same type of measurements (letβs say camera).
If we over parametrize the system, we get singularity. Even if we have rank efficiency of 7, some values will come out. Make sure the system is not singular.
We compute the covariance between parameters. If it gets too close to 1, it means they are very correlated so we might need to take one of them out.
Parameter Estimability:
Check condition number of .
Check correlation coefficients
For self-calibration, we for e.g. add the intrinsic parameters as unknowns.
We need to make sure we have points that make correlation possible.