The data has been recorded at an industrial facility, with wide variety of objects in the field of view including heavy machinery, vehicles, workers, storage containers, buildings, and pipes. Time-synchronized stereo image pairs is recorded at 2 FPS from a vehicle mounted setup. The scene aptly depicts a cluttered environment where network-based classification approaches tend to degrade due to the presence of out-of-class objects. On the other hand, traditional lidar suffers from low point density on distant targets, which is crucial for accurate 3D perception of the environment. The depth map depicts high resolution even on low aspect ratio targets which are often overlooked when using other sensing modalities, while preserve crisp object edges.
The full data can be accessed here: AWS S3 link
Horizontal Field of View | 30 degrees |
---|---|
Baseline | 1.14 meters |
Resolution | 5.4 MP |
Bit depth | 8 bit |
Frame rate | 2 FPS |
These are vertically concatenated raw images from the left and right camera as shown below:
Assuming pin-hole model for the camera, we get the following intrinsics for our left camera (1) and right camera (2):
| i1_fx = 5300.87 i1_fy = 5301.37 i1_cx = 1441.62 i1_cy = 940.999 i1_k1 = -0.0992041 i1_k2 = 0.0274531 i1_k3 = 1.37336 i1_k4 = 0.0 i1_k5 = 0.0 i1_k6 = 0.0 i1_p1 = 0.000950591 i1_p2 = -1.55112e-05
| i2_fx = 5296.24 i2_fy = 5295.54 i2_cx = 1425.58 i2_cy = 933.17 i2_k1 = -0.105835 i2_k2 = 0.125799 i2_k3 = 0.573352 i2_k4 = 0.0 i2_k5 = 0.0 i2_k6 = 0.0 i2_p1 = 0.000234119 i2_p2 = -0.000643232
|
| --- | --- |
We choose the center of the left rectified image as our frame of reference, where the z-axis faces in the forward direction and the y-axis points in the downward direction. Consequently, the right camera is located along the x-axis in our chosen frame of reference.
The translation (in m) and rotation (in degrees) for the right-camera w.r.t our frame of reference is shown below:
Tx = 1.1449 Ty = 0.0009 Tz = 0.0007 theta_x = -0.36418693812442166 theta_y = -0.08866673029069523 theta_z = 0.07364561800016838
We follow the Z-Y-X Euler angle rotation convention. Consequently the overall rotation matrix can be established by: R = R_z * R_y * R_x.
The left-rectified image shows the image from the left camera after rectification, as shown below: