Summary

The data has been recorded at an industrial facility, with wide variety of objects in the field of view including heavy machinery, vehicles, workers, storage containers, buildings, and pipes. Time-synchronized stereo image pairs is recorded at 2 FPS from a vehicle mounted setup. The scene aptly depicts a cluttered environment where network-based classification approaches tend to degrade due to the presence of out-of-class objects. On the other hand, traditional lidar suffers from low point density on distant targets, which is crucial for accurate 3D perception of the environment. The depth map depicts high resolution even on low aspect ratio targets which are often overlooked when using other sensing modalities, while preserve crisp object edges.

The full data can be accessed here: Dataset download link, and can be visualized with NODAR viewer.

Alternatively, the 3d pointclouds can be accessed here: Point cloud download link, and we recommend using CloudCompare for visualizing them.

Camera Specifications

Horizontal Field of View	30 degrees
Baseline	1.14 meters
Resolution	5.4 MP
Bit depth	8 bit
Frame rate	2 FPS

Topbot Images

These are vertically concatenated raw images from the left and right camera as shown below:

Camera Intrinsic Parameters

Assuming pin-hole model for the camera, we get the following intrinsics for our left camera (1) and right camera (2):

| i1_fx = 5300.87 i1_fy = 5301.37 i1_cx = 1441.62 i1_cy = 940.999 i1_k1 = -0.0992041 i1_k2 = 0.0274531 i1_k3 = 1.37336 i1_k4 = 0.0 i1_k5 = 0.0 i1_k6 = 0.0 i1_p1 = 0.000950591 i1_p2 = -1.55112e-05 | i2_fx = 5296.24 i2_fy = 5295.54 i2_cx = 1425.58 i2_cy = 933.17 i2_k1 = -0.105835 i2_k2 = 0.125799 i2_k3 = 0.573352 i2_k4 = 0.0 i2_k5 = 0.0 i2_k6 = 0.0 i2_p1 = 0.000234119 i2_p2 = -0.000643232 | | --- | --- |

Extrinsic Parameters

We choose the center of the left rectified image as our frame of reference, where the z-axis faces in the forward direction and the y-axis points in the downward direction. Consequently, the right camera is located along the x-axis in our chosen frame of reference.

The translation (in m) and rotation (in degrees) for the right-camera w.r.t our frame of reference is shown below:

Tx = 1.1449 Ty = 0.0009 Tz = 0.0007 theta_x = -0.36418693812442166 theta_y = -0.08866673029069523 theta_z = 0.07364561800016838

We follow the Z-Y-X Euler angle rotation convention. Consequently the overall rotation matrix can be established by: R = R_z * R_y * R_x.

Left-rectified Images

The left-rectified image shows the image from the left camera after rectification, as shown below: