Stereo vision is a mature technology that captures the surrounding environment's 3D and color information. It has been used extensively in robotics and manufacturing to capture 3D geometries at resolutions unattainable with lidar approaches. Previously, stereo vision has been limited to short-range applications with less than ~5-meter range, until now. NODAR has patented technology — ultra-wide baseline stereo vision, supporting baselines (distance between cameras) from 0.5 m to 3 m and beyond — that extends the range of stereo vision cameras to 1000+ meters. This overview describes the principles behind NODAR’s software products.
Triangulation is a reliable method for estimating range, like a tape measure. It is a direct measurement and is not inferred from tangential information (such as monocular depth estimation). The range to an object can be expressed explicitly as a function of the angles from two cameras and the distance between the cameras.
Stereo cameras measure distance through triangulation. The farther apart the cameras are placed — that is, as the baseline is increased — the more precisely the range of an object can be triangulated. In fact, the depth uncertainty (sometimes called range resolution) is proportional to the baseline length. Therefore, for the same camera resolution and optics, increasing the baseline by a factor of 10, decreases the depth uncertainty by a factor of 10. A photo of a 0.5-m-baseline stereo camera is shown next to a 0.05-m-baseline stereo camera in Fig. 1.
Fig. 1. Ultra-wide-baseline stereo camera with 0.5-m baseline vs. standard stereo camera with 0.05-m baseline.
The depth uncertainty, $\Delta z$, for a stereo vision camera is derived in Fig. 2, and is equal to
$$ \Delta z = \frac{\text{IFOV}\, R^2\, \delta}{B}, $$
(Eq. 1)
where $\text{IFOV}$ is the instantaneous field of view, $R$ is the range to the object, $\delta$ is the disparity measurement resolution ($\delta=0.1$ pixels for Hammerhead software), and $B$ is the baseline length. Fig. 2 shows graphically that as the baseline increases, the depth uncertainty improves.
Fig. 2. Depth uncertainty improves as the baseline length gets bigger.
The equation for depth uncertainty, Eq. 1, also shows that it increases as the square of the range, i.e., $R^2$, which is the main reason why previous stereo vision cameras were limited to less than about 5-meter range. The depth uncertainty is plotted as a function of range in Fig. 3 for both lidar and stereo vision systems. Note that the substitution, $\text{IFOV} = 1/f$, was used. The range resolution is the distance between two objects before they blend together in range, i.e., how well can the system resolve two closely-spaced objects in range. To be clear, range resolution is not signal-dependent and is not the same as range precision or range accuracy. Precision is derived from resolution by averaging multiple returns and depends on SNR, which is range-dependent.
The range resolution of lidar is the point spread function in range, that is, it is the spread of returns from a flat normal target, and is equal to the bandwidth of the lidar transceiver. Typically, lidars have 4-ns pulses with a matching 250-MHz optoelectronic receiver, which corresponds to a range resolution of 60 cm. Fig. 3 shows a horizontal line for lidar (orange line), showing that the range resolution is the same for all ranges, because the resolution only depends on the laser pulse width (more precisely, the lidar transceiver bandwidth).
Fig. 3. Range resolution vs. range.
The range resolution of stereo vision is given by the depth uncertainty equation, Eq. 1, with $\delta$ set equal to one pixel, the minimum feature size that can be discerned optically. Fig. 3 shows the range resolution of three different stereo cameras with different focal lengths and baselines. Standard stereo vision with 0.1-m baseline (yellow line) has better range resolution than lidar up to 5-m range, ultra-wide stereo vision with 1-m baseline (blue line) has better range resolution than lidar up to 55 m (blue line), and ultra-wide stereo vision with 3-m baseline (green line) has better range resolution than lidar up to 140 m (green). Increasing the baseline of stereo vision cameras opens long-range sensing applications.
Fig. 3.5. Example Hammerhead depth map and point cloud outputs for a car driving on the highway with cameras in the upper left and right windshield: 1.1-m baseline, 30° FOV, 5.4 MP cameras (Sony IMX490).
While the concept of widening the baseline to achieve accurate 3D point clouds at long range is straightforward, producing durable, production-grade stereo cameras for outdoor use on mobile platforms—such as cars, trucks, or wind-exposed infrastructure—was impractical due to calibration challenges. Stereo vision relies on measuring tiny angles to determine object locations, making it highly sensitive to disruptions. Even minute angular shifts of just 0.01° in the cameras could result in significant range errors or render the stereo matcher unable to find reliable correspondences, preventing it from reporting any range data.
In fact, the angular disturbance of cameras caused by environmental factors increases with the square of the baseline length, effectively limiting practical stereo vision systems to baseline lengths of just a few tens of centimeters. While longer baseline systems have been documented in the literature, they are predominantly confined to indoor, static applications and require frequent recalibration. Fig. 4 shows that the same force on the end of a narrow- and wide-baseline stereo camera causes the slope at the end of the wider beam to deflect 100 times more than the narrower beam. For more details, we provide an application note that explains the necessity of frame-to-frame calibration [here].
Fig. 4. An ultra-wide baseline stereo camera (bottom) and a short-baseline stereo camera (top) with 10x the baseline length. The angular displacement of the beam from the same force is 100x for the ultra-wide baseline, making it much more sensitive to calibration issues. Hammerhead compensates for large calibration errors in software.