As previously mentioned, a limitation of the conventional approach to stereo driving is that it relies on precise metric calibration with respect to an external calibration target in order to convert matches to 3-D points. From a practical standpoint, this is a serious limitation in scenarios in which the sensing hardware cannot be physically accessed, such as in the case of planetary exploration. In particular, this limitation implies that the vision system must remain perfectly calibrated over the course of an entire mission. From a philosophical point of view, navigation should not require the precise knowledge of the 3-D position of points in the scene: What is important is how much a point deviates from the reference ground plane, not its exact position.
Based on these observations, we developed an approach in which a relative measure of height with respect to a ground plane is computed from the matches without requiring the knowledge of the full set of camera parameters. This height is relative in the sense that it is a multiple of the true height by a unknown scale factor. We now describe the construction of the relative height in detail. The geometry described below has been used in earlier work in which a point is classified as belonging to one of two halfspaces based on its projections in two uncalibrated images [11].
Let us consider first a flat ground plane observed by two cameras. We assume that the only known
information
about the geometry of the cameras is the epipolar geometry. Let be a generic point on the plane and
and
be
its projections in the left and right images, respectively. We represent points in the image plane by 2-D projective
coordinates,
, where the usual Cartesian image coordinates are
and
. It can be easily shown that for any point
, the projections
are related by a linear projective transformation, or homography,
:
. In this relation, the symbol =
means that the two sides are equal in the projective sense, i.e., that their coordinates are proportional.
Intuitively,
maps a pixel from the right image to its location in the left image assuming that the corresponding
3-D scene point lies on the plane (Figure 2(a)).
The homography is a 3x3 matrix defined up to a scale factor.
can be easily estimated from real images in the following way. First,
features
are selected in the left image and the corresponding pixels
in the right image are computed using
the algorithm of Section 2.1. Then, the parameters of
are computed by solving for the
least-squares criterion:
.
The features used in the
computation of
may be anywhere in the image. Moreover, computing
does not require any information on the
actual 3-D positions of the scene points used as features.
We now show that is all we need to compute a relative elevation map. Consider a world point
not
necessarily on the ground plane and its projections
and
. Let us assume that we also have defined once and
for all a "reference" point
described by its projections
and
.
may be anywhere
in space so long as it is not on the reference plane. Of course the point
is not known; only its projections
in the images are known. Let us consider now the image points
and
.
Point
(resp.
) is the point at which
(resp.
) would be projected in the right image if it
were on the ground plane. Finally, consider the point
, intersection in the right image of the two segments
and
. Since
is the projection of the segment
and
is the projection of a line segment contained in the plane, the intersection point
must be the projection
of the intersection of
with the reference plane (Figure 2(b)).
The previous reasoning shows that we now have a way to compute the intersection of the line joining a point and a
reference point with a reference plane without computing the actual 3-D position of the point.
Now, let become the point at infinity in a
given direction. The intersection
becomes the image of the projection of
onto the reference plane in the direction given
by
(Figure 2(c)).
Because
is now the projection of
, the distance in the image
is directly related to the height of
with respect to the reference plane. In practice, we use another reference
point
which we declare to be at height one from the reference plane. If
is the image of the
projection of
on the reference plane, then the height is defined as:
(Figure 2(d)).
In affine geometry, this definition of height is exact in the sense that is
proportional to the distance between
and the reference plane. In the projective case, an additional reference
plane is required in order to define a concept of projective height. However,
the affine approximation is accurate enough for the purpose of navigation
because the heights are computed at relatively long range from the camera and
over a relatively shallow depth of field.
The relative height is also used for limiting the search in the stereo
matching. More precisely, we define an interval of heights
which we anticipate in a typical terrain. This interval is converted at each
pixel to a disparity range
. This is an effective way of
limiting the search by searching only for disparities that are physically
meaningful at each pixel.
In addition to the relative elevation, a measure of slope relative to the ground plane can also be computed under minimal knowledge of camera geometry. We do not describe the slope evaluation algorithm here because it is not integrated in the current system. We refer the reader to [12] and [1] for a detailed presentation.