Summary

The data constitutes two stereo image pairs collected from cameras mounted on a vehicle. There is a target vehicle which is in the field of view of the stereo setup and is approaching the ego vehicle from a far away distance (~0.5 km). Even at the farthest distance, we observe dense depth data on the target vehicle with ~600 valid pixels. This enables the object detection and tracking module to reliably classify it as an object across all frames. The BEV visualization appropriately depicts enhanced pointcloud density at the target locations.

The full data can be accessed here: AWS S3 link

Camera Specifications

Horizontal Field of View 30 degrees
Baseline 1.14 meters
Resolution 5.4 MP
Bit depth 16 bit
Frame rate 5 FPS

Topbot Images

These are vertically concatenated raw images from the left and right camera as shown below:

000000200.tiff

Camera Intrinsic Parameters

Assuming pin-hole model for the camera, we get the following intrinsics for our left camera (1) and right camera (2):

| i1_fx = 5279.4 i1_fy = 5283.29 i1_cx = 1458.8 i1_cy = 977.89 i1_k1 = -0.192601 i1_k2 = 0.284897 i1_k3 = -0.522051 i1_k4 = 0 i1_k5 = 0 i1_k6 = 0 i1_p1 = 0.00389653 i1_p2 = -0.000288763 | i2_fx = 5277.86 i2_fy = 5279.38 i2_cx = 1412.71 i2_cy = 987.8339999999999 i2_k1 = -0.196102 i2_k2 = 0.312148 i2_k3 = -0.467376 i2_k4 = 0 i2_k5 = 0 i2_k6 = 0 i2_p1 = 0.00353418 i2_p2 = 2.75581e-05 | | --- | --- |

Extrinsic Parameters

We choose the center of the left rectified image as our frame of reference, where the z-axis faces in the forward direction and the y-axis points in the downward direction. Consequently, the right camera is located along the x-axis in our chosen frame of reference.

image.png

The translation (in m) and rotation (in degrees) for the right-camera w.r.t our frame of reference is shown below:

Tx = 1.1439 Ty = -0.0042 Tz = -0.0030 theta_x = -0.08825224750231324 theta_y = 1.484531545908633 theta_z = 0.04643460649819143

We follow the Z-Y-X Euler angle rotation convention. Consequently the overall rotation matrix can be established by: R = R_z * R_y * R_x.

Left-rectified Images

The left-rectified image shows the image from the left camera after rectification, as shown below: