December 6, 2024

We are often asked, “Do you really need to calibrate every frame?” Even computer vision experts are surprised to learn that the answer is yes when

Stereo vision cameras measure depth by triangulation: the distance to an object or “feature” is calculated geometrically from the distance between the cameras (the “baseline”) and the two angular measurements from two cameras to a common feature in the image. Therefore, anything that causes angular errors in the measurement, such as slight tilts and rotations of the cameras, can cause catastrophic errors: either the depth is incorrect, or worse, the feature can no longer be matched between cameras.

Ultra-wide-baseline stereo cameras, defined as stereo cameras with baselines exceeding 50 cm, are particularly susceptible to perturbations, shock, and vibration. In fact, the angular error between the left and right cameras worsens as the square of the distance between baselines (here is a quick derivation). In fact, the relative angular deflection between two cameras mounted on a 1-m bar made of 1-inch square extruded aluminum subjected to a 10g shock (”a bump on the road”) is 0.108°. This large angular deflection corresponds to 10 pixels for a 5.4MP camera with a 30° field of view (FOV) and renders the depth map invalid. For normal stereo cameras with only 0.1-m baseline, for example, the relative angular deflection decreases by a factor of 100 to only 0.00108°, which does not noticeably degrade the depth map. Frame-by-frame calibration becomes $B^2$-more important as the baseline length $B$ increases — requiring advanced online calibration software for ultra-wide stereo vision cameras.

Test vehicle with cameras integrated into the upper left and right windshield corners.

Test vehicle with cameras integrated into the upper left and right windshield corners.

NODAR has a test vehicle with our ultra-wide baseline stereo vision system. The system has two 5.4 MP, 30° FOV, HDR cameras mounted in the upper left and upper right corners of the windshield, approximately 1.1-m apart (see image above). This car was driven on a cobblestone road, and a video of the depth maps with and without frame-by-frame calibration is shown below.

Depth maps without (left) and with (right) frame-by-frame calibration.

Depth maps without (left) and with (right) frame-by-frame calibration.

Driving on a cobblestone road caused the cameras to shake and misalign, which resulted in an invalid depth map (left). Adding software that quickly calibrates the camera extrinsic parameters every frame allows accurate reconstruction of the depth map (right).

To calibrate every frame, not only were efficient algorithms needed, but NODAR had to develop an algorithm that could produce the camera extrinsic parameters with sub-pixel re-projection error on a single snapshot. The well-known keypoint approach to autocalibration [ref] requires averaging about 100-1000 frames to obtain sub-pixel re-projection error across the image. This innovation allowed us, for the first time, to actually see how the extrinsic camera parameters evolved every frame. No one has been able to measure this before! Below is a sequence of the relative roll, pitch, and yaw angles of the cameras in NODAR’s test vehicle as it drove over a “smooth” highway road at ~130 km/hr.

Extrinsic camera parameters of a 1.1-m baseline stereo vision camera mounted in the upper windshield. The x-axis is the frame number. The plot shows the variations over ~8 minutes of highway driving.

Extrinsic camera parameters of a 1.1-m baseline stereo vision camera mounted in the upper windshield. The x-axis is the frame number. The plot shows the variations over ~8 minutes of highway driving.

Notice that the cameras vibrate and change their angle every frame. These changes must be compensated for extremely accurate stereo vision. In this highway sequence, the relative pitch angle changes by about 0.02° or 2 pixels (for 5.4 MP, 30° FOV cameras), which significantly degrades most stereo vision algorithms [ref].

The Challenge of Long-Range Small Obstacle Detection

Detecting small obstacles at long ranges is no small feat. It requires:

But here’s the catch: longer baselines mean greater exposure to vibrations. And greater vibrations amplify calibration drift, making continuous calibration absolutely essential. That’s where Nodar’s Hammerhead technology comes in. Unlike traditional systems that require manual intervention or downtime for calibration, Hammerhead continuously adjusts and re-calibrates in real time. It recovers calibration within just a few frames—less than 0.2 seconds—ensuring your stereo vision system stays precise, no matter the conditions.

Here are some more examples of Hammerhead in action on the road: