Related: SeaClear, Coordinate Frames

I used this concept to determine the position of the underwater robot in a pool, while taking the refraction of the water into consideration.

With this, I converted a 2D pixel coordinates into 3D world points, accounting for light refraction at the air-water interface

I assumed

World Frame: Origin at ArUco marker, Z pointing up
Camera Frame: Origin at camera optical center

I get some nasty errors, likely due to the distortion parameters, since I implemented this refraction with the water. I could definitely see some improvements in the distances (especially around the edges of the FOV again).

Basically what happens: when I get to the edge of the FOV in the UsbCamera frame, I get some nasty drift (you can see in the plot image). I highly believe this is a problem because of the setup (I will make a page shortly on that -- how to ensure a good setup when working with multiple cameras) and also because of the poor quality of the ArduCam. Even though I calibrated with many decimals for the distortion, I still get that drift.

Convert Pixel to normalized camera coordinates

pts = np.array([[[pixel_x, pixel_y]]], dtype=np.float64)
x_n, y_n = cv2.undistortPoints(pts, K, dist_coeffs)[0,0]
dC = np.array([x_n, y_n, 1.0], dtype=np.float64) # not unit
d_air_W = camera_orientation @ dC
o_W = camera_center

Take the raw pixel coordinates (pixel_x, pixel_y)
Remove lens distortion with cv2.undistortPoints (performs the inverse)

x_{n} = (pixel_{x} - c_{x}) / f_{x}

y_{n} = (pixel_{y} - c_{y}) / f_{y}

The intrinsic matrix maps 3D camera points to pixels:

K = f_{x} 00 0 f_{y} 0 c_{x} c_{y} 1

$(x_{n}, y_{n})$ represents where a ray through that pixel intersects plane $z = 1$ in camera coordinates.
d_air_W is the direction vector of a ray in world coordinates, starting at the camera center (o_W) and going towards the pixel, before refraction
camera_orientation is the rotation matrix R from camera frame to world frame
camera_center is the camera position in world coordinates from the ArUco detection. I got it by using Rodrigues’ Rotation Formula.

Camera coordinate system:

    Z (forward, into scene)
    ^
    |
    |    • (x_n, y_n, 1) <-- point on image plane
    |   /
    |  /
    | /
    o───────→ X (right)
   /
  ↓
  Y (down)

Intersect with water surface (z=0)

denom = d_air_W[2]
s0 = (0.0 - o_W[2]) / denom
if s0 <= 0:
	rospy.logwarn(f"Ray does not intersect water surface, or camera is below water surface, camera_z = {o_W[2]}")
	return None
I0_W = o_W + s0 * d_air_W

denom is the z component of the ray
s0 $= \frac{0 - O _{W [2]}}{d _ ai r _ W}$ tells me how far I need to go along the ray to intersect the water surface plane z $= 0$
- If s0 $> 0$ ⇒ it’s in front (valid)
- If s0 $< 0$ ⇒ it’s behind (invalid)
I0_W $= O_{W [2]} + s_{0} \cdot d_ai r_W$ is the 3D intersection point of the incoming ray with the water surface plane (z $= 0$ )

        o_W (camera)
         \
          \  s0 = distance to water
           \
    ────────●──────────  z = 0
           I0_W

Compute Refracted Direction Into Water

n_surface_up = np.array([0.0, 0.0, 1.0])
d_in = d_air_W / np.linalg.norm(d_air_W)
d_wtr = self.refract_dir(d_in, n_surface_up, air_n, water_n)

d_in is the unit direction vector of the ray in the air pointing towards I0_W
d_wtr is the refracted ray inside the water computed with Snell’s Law

    AIR (n=1.0)
         \  θ₁ (incident angle)
          \
    ───────●─────────  surface
          /
         / θ₂ (refracted angle, smaller)
        /
    WATER (n=1.33)

Intersect refracted ray with plane z = depth (coming from IMU)

denom2 = d_wtr[2]
s1 = ((float(depth)) - I0_W[2]) / denom2
P_W = I0_W + s1 * d_wtr

denom2 is the z component of the refracted ray direction
- If denom2 $\sim 0$ ⇒ Ray parallel to water surface after refraction
- If denom2 $< 0$ ⇒ ray pointing downward into the water
s1 $= \frac{depth[IMU] - I 0_ W}{d _ wt r _ Z}$ tells me how far along the ray to travel to reach plane at depth level
P_W $= I 0_W + s_{1} \cdot d_wt r$ is the 3D world point on the ROV’s depth plane that corresponds to the original pixel, AFTER refraction. This is what I want to get!! (reference + direction * distance)

    ─────────●───────────  z = 0 (X0_W)
              \
               \  d_wtr (refracted ray)
                \
                 ● P_W   z = depth

Convert back to Camera Frame

R_CW = camera_orientation.T
p_C = R_CW @ (P_W - o_W)
return p_C

R_CW is the rotation matrix that takes a world point to express it in the camera frame. I will let the TF library from ROS to handle the conversion back to world.
p_C is the 3D point in camera coordinates, which is then transformed to the ArUco marker frame by the transform_point_to_world function. We subtract P_W - o_W to get the relative distant from the camera to that exact point under water.

Why refraction matters

What camera “sees” VERSUS Actual position:

  camera                    camera
     \                         \
      \                         \
──────●────  surface     ───────●────
       |                           \
       |  (straight line)           \  (bent ray)
       |                             \
       ● <-- wrong position           ● <-- correct position

The error increases with

depth
viewing angle (rays at edge bend more)

🚀 Costin Chitic

Recent Notes

ROS2 - Writing Publishers and Subscribers

ROS2 Commands Basics

ROS2 Starting Basics

Error Analysis of Airborne Laser Scanning Data

Point Cloud Segmentation Practical

Ray Tracing Through Water

Convert Pixel to normalized camera coordinates

Intersect with water surface (z=0)

Compute Refracted Direction Into Water

Intersect refracted ray with plane z = depth (coming from IMU)

Convert back to Camera Frame

Why refraction matters

Graph View

Table of Contents

Backlinks