[1]The authors may be contacted at the following addresses by email:
buffa@@essi.fr , diard@@essi.fr , sander@@essi.fr .
See also:
http://www.essi.fr/buffa ,
http://www.essi.fr/diard ,
http://www.essi.fr/sander .

The Virtual Diver, an Architectural ``Swim-Around'' System Incorporating Real Imagery

Michel Buffa, Franck Diard, Mats Persson, Peter Sander

Laboratoire I3S, 06903 Sophia-Antipolis cedex, France
Linköping University, S-581 Linköping, Sweden

Abstract:

In this paper, we present the Virtual Diver project. The goal of the project is to apply Virtual Reality methods to the interactive exploration of artificial reefs. We concentrate here on the process of fusing geometric and photometric information in order to present a more realistic simulation. We present aspects of our adaptation of camera calibration methods registering a CAD model with images, and a novel texture-mapping algorithm based on the Z-buffer.

Introduction

In this paper, we present the Virtual Diver project, work conducted within the Images group at the CNRS laboratory Informatics, Signals, and Systems at Sophia-Antipolis (I3S), in collaboration with the AquaScience Association of the University of Nice - Sophia Antipolis (UNSA ).

The Virtual Diver project arose in response to the needs of marine biologists at UNSA to visualize the benthic underwater environment. In particular, they were studying the colonization by underwater life of artificial reefs, which presents certain challenges, not the least of which is access to the research sites located at depths of up to thirty meters. The reefs must be visited regularly to study the colonization over time, but the depths involved can make this a somewhat hazardous procedure available only to experienced scuba-divers. The emerging technology of Virtual Reality (VR) seemed well-suited to enable ``dry-diving'' onto a computer-created simulation of the reefs, to give a more realistic representation than would be possible just by viewing photographs or videotape.

The goal of the project is thus to create a real-time walk-around system incorporating real underwater imagery. We expect that this will bring a better understanding of the marine biology of the micro-ecosystem for the biologists while providing a demand-driven application for engineering research into VR systems. Off-the-shelf VR technology is not suitable, as the fundamental premise of the project is the visualization of ``what is really there'' rather than the creation of a synthetic world either ab initio or as the result of computer simulation. Nor is the project strictly of the ``3-D reconstruction from a sequence of (stereo) images'' variety [4,21,12,20] as the structural geometric information is already available in the form of CAD models of the artificial reefs. The challenge is to combine the geometry and the photometry by mapping the images onto the CAD model [17] (sometimes refered to as ``augmented virtuality'') and then to provide user-driven exploration of the composite artificial yet realistic world.

The following section briefly describes the underwater reserve of Monaco and the artificial reefs, §2 is an overview of the Virtual Diver project and the VR tools we are developing, §3 describes the methods for combining the geometric and the photometric information. We present examples on three types of images: synthetic (in order to explain how the computer vision algorithms work), images taken in the laboratory (to evaluate algorithm performance on ``somewhat realistic'' data), and finally, real images (which are the eventual goal of the project, but which are much more difficult as the biology in never ``clean'').

The paper is not intended to be self-contained, but rather to present an overview of our work in progress. We present some camera-calibration details in App. A however, and the reader interested in the specifics of our algorithms is refered to [8].

The Underwater Reserve of Monaco

The reserve was created in 1976 on the instructions of Prince Rainer III of Monaco. The design and administration are entrusted to the Monegasque Association for the Protection of Nature (AMPN ). The reasons for creating the reserve were the alarming situation in the area due to overfishing, which was endangering the existence of local species [6].

Figure 1: The Underwater Reserve of Monaco. Courtesy of AMPN (photo: Christian Giordan).

To prevent this disappearance and to produce a zone in which the species still present would be able to reproduce under the most favourable conditions, a protected underwater reserve covering an area of 500,000 square meters, with a perimeter of 2.2 kilometres, and at a depth of from 2 to 38 meters. Inside the protected zone the following activities are strictly prohibited: all forms of fishing from the surface, scuba diving, underwater fishing, all powered navigation, anchoring.

The Artificial Reefs

In order to make the reserve as attractive as possible to a wide variety of species, it was decided to sink artificial reefs. A first attempt was made at building them of natural rocks (weighing up to 300 tonnes) sunk at a depth of 25 meters, but the shape of the reef could not be sufficiently well controlled. The AMPN then turned to structures which would be easier to transport and which could be assembled on land.

The solution adopted was to cement together rough concrete construction blocks such as those used by the building industry. They have a unit weight of 25 to 35 kg and dimensions of cm, which makes for easy handling, see Fig. 2.

Figure 2: (a) A unit block. The roughness of the walls not only facilitates easier cementing, but also the attachment of benthic organisms. (b) A group of blocks as initially sunk underwater. Courtesy of AMPN (photo: Jean-Marie Moll).

Two different types of artificial reefs were constructed from the rough-wall concrete blocks, the first in the form of a truncated pyramid and a second model in the form of a hollow octagonal structure. The latter type seemed more satisfactory in allowing better light as well as providing a large shaded area in the central shaft, and, importantly for us, the underwater structure is reasonably well represented by a CAD model. Figure 3 shows typical images of an artificial reef in the underwater reserve. Note the difference in the quantity of organisms on the reef between figs. 2(b) and 3 (seven years later).

Figure 3: Artificial reef at -28 meters. Courtesy of AMPN \ (photos: Roberto Pronzato, Jean Norbert Monot).

The Virtual Diver project

The aim of the project is to allow the highly-interactive exploration of (images of) the reefs, first in 3-D at a given moment in time, and eventually through images collected at regular intervals over time as well. By this we mean a VR system where it is possible to move through a computer-generated world that is an ``exact'' copy of the underwater reserve, created by mapping real world images onto the synthetic model. We want to be able to explore this world in much the same way as one explores the underwater realm when one is actually there, i.e., non-linearly, unconstrained by the path chosen by the wet diver who actually did the videotaping of the reef.

Work on the project can be roughly divided into the following two functionally distinct packages:

Immersion phase:: This is the actual navigation in the virtual world --- diving while keeping your feet dry! This implies close collaboration with the end-users, the marine biologists, and is concerned with 3-D movement in an aquatic environment. While waiting for VR hardware to stabilize at a reasonable price, exploration takes place on a computer screen under 3-D joystick control.
Construction phase:: Of course the 3-D world has to be built, and this is where the hard research problems lie. We take this up in more detail in §3.

As the project is new and initially user-driven, we have concentrated our start-up efforts on determining functionality requirements with the end-users. In this paper however, we present our work on the construction phase. Initially, we are adapting computer vision methods already developed for mobile robotics [3,2,1] to this new problem.

Functionality requirements

The functionality listed below is the core of the results that have been gathered through discussions with future users. These are the informal demands on how the system finally should work and hence the first building bricks of this research project.

Navigation

The basic consideration in a VR system is of course movement --- the user has to be able to move inside the computer-generated world in a ``realistic'' manner. Movement in this specific case involves movement in a viscous medium (water) and realistic simulation takes this into consideration, for example by restricting the speed with which changes of direction are allowed to take place.

Texture mapping

The crux of the research problem is to map real-world imagery onto the geometry --- once initialized, the CAD model automatically ``wraps itself'' into an uninterrupted sequence of video imagery, see §3 below.

Information database

A longer-term goal of the system is to serve as a teaching / reference aid --- the user, interested by some particular organism on the reef, wishes to have access to further information. Designating the organism would bring up a window in which multi-media hypertext-like information would be made available.

Construction phase

Of course the 3-D world has to be built from the structural geometric model and the video images. This involves determination of the camera viewpoint from the image sequences in order to map images onto the given geometric structure. Three-dimensional reconstruction from stereo image sequences with a calibrated camera rig is a difficult problem even in the most favourable of circumstances [9], and the results of currently-available systems seem insufficient for the precise detail needed by the marine biologist end-users. However, in the project we have the advantage that the geometric structure already exists, see Fig. 3, and we can use it to constrain matches with the images as in Fig. 5.

Figure 4: One face of the detailed CAD model of an artificial reef.

Figure 5: The sequence of video images is mapped onto the geometric structure.

Initialization is done interactively, then the system determines camera motion from a continuous image sequence and maps the images onto the appropriate pose of the model [15,13]. Several different strategies are available as a function of the grain at which the mapping takes place, e.g., at a coarse level to mosaic whole images onto planar faces of the geometric model (see for example [19,14]), in which holes in the structure are actually images of holes. On a finer level of detail, the geometric structure is used as a ``cookie-cutter'' to select portions of images corresponding only to surfaces of the blocks. The topology of the structure is respected, and the virtual diver will actually be able to penetrate into the holes in the blocks and ``swim'' through the stucture.

Computing the camera viewpoint

Inevitably, the first problem encountered in attempting to recover 3-D information about the world from a stereo image or from a sequence of images is that of determining an accurate model for the camera(s) used - the camera calibration problem. Camera calibration has received much attention in the computer vision literature over the last decade (see [9] for a survey), and common techniques fall into two classes:

Strong calibration: techniques produce the most accurate calibration. They consist of imaging a special-purpose reference calibration target, whose structure is known a priori. By matching points of interest in the image with their corresponding vertices in the target, the intrinsinc parameters of the camera, and the 3-D camera position can be computed. See [11] for details.
Weak calibration: techniques are more and more frequently used in stereovision systems [10]. They do not rely on the use of a calibration target. Using an arbitrary pair of images, stereo matches are computed at a small set of points, and only the epipolar geometry of the camera is computed. Usually, weak calibration techniques are used by systems that do not need to know precisely the extrinsic (position) parameters of the camera [5].

Accurate strong calibration is hard to achieve, principally as very good observation of the calibration target is required, and it has been shown that the quality of the calibration decreases as the distance between the target and the camera increases. In addition there are practical considerations which can make strong calibration difficult in some cases. Consider, for example, underwater observation in which it is not practical to calibrate the cameras with respect to a special-purpose reference target --- it may just not be possible to sink the calibration target each time it is necessary to take pictures of an underwater landscape.

On the other hand, weak calibration techniques are still a subject of research and can be somewhat unstable. Furthermore, they do not give explicitely the 3-D position of the camera, which is exactly what we need. Thus, we have decided to use a modified version of the strong calibration techniques (later, we will consider the possible use of weak calibration techniques).

For our application in fact, we have no need of a special-purpose calibration target since we already have available the 3-D model of the artificial reef. Given an image of the reef, the user can select a small set of points (at least six) corresponding to some distinguished vertices in the 3-D model. Using these matches, the camera parameters can be computed, in particular the 3-D camera position (see App. A for more details of the recovery of the camera parameters).

Figure 6 demonstrates the principles involved on a wholly synthetic image (useful for judging accuracy). A cube was modeled and imaged by a synthetic camera with known parameters, and the idea is to recover the position and orientation of the cube from the image. It can be seen that the method performs very well, and this is confirmed quantitatively in table 1 comparing the parameters of the synthetic camera with the camera parameters determined by the calibration procedure.

Table 1: Comparison of the camera parameters of the synthetic camera and as determined from the image.

Figure 6: Object corners extracted from a synthetic image, and the 3-D model superposed onto the object after calibration.

The next level of difficulty is on what we term ``realistic'' imagery, that is, real images but of man-made, generally rectilinear, environments. This is the situation confronted by mobile robots evolving in building interiors, for example. In fig. 7(a) we show the image of a textured box taken by a video camera in the laboratory (the image is intentionally simplistic, we have created a somewhat more elaborate ``artificial artificial reef'' in the laboratory for running more comprehensive tests), and the six features extracted for calibration. Manual selection of these corners can be subject to error, hence we have used an efficient corner and vertex detector [7] which works to subpixel accuracy directly on the grey levels of the image. The user has only to indicate a rectangular zone containing the corner. Very good results have been obtained on these types of images, as in fig. 7(b) showing a close-up of one of the selected corners. Figure 7(c) shows the superposition of the 3-D model onto the image of the object. The projection has been performed using the calibration matrix computed from the features of the image.

Figure 7: (a) Object corners in a real image. (b) Zoom of one of the corners. (c) Wireframe model superposed onto the real object in the image. The camera position was determined from the calibration parameters computed from the image.

The real-world case is of course much more difficult than either the synthetic of realistic situation shown above. It cannot be guaranteed that the reefs that have been sunk in the reserve still correspond exactly to the 3-D model of the initial design. Furthermore, the colonization of the reefs by underwater life-forms has altered their shape, e.g., the edges of the blocks that compose the reefs are covered by concretions, and sharp corners can be difficult to distinguish, see figs. 3, 3.1.

Figure 8: Schematic of block corner showing colonization by brown and green algae. Courtesy of AMPN .

Mapping images onto the 3-D model

Once the camera position has been determined relative to the image, a correspondence is established between the geometric structure and the image, i.e., which faces of the model correspond to which regions of the image. This allows us to texture-map the image pixels onto the model. We use a novel technique based on an idea from computer graphics and image synthesis, that of a Z-buffer [8]. A detailed presentation of the algorithm is beyond the scope of this paper (the interested reader is refered to [8]), however we give an example of how it works.

Figure 9(a) shows an image of a textured test object taken in the laboratory with superposed wireframe showing a triangulation of its faces, and 9(b) is the object seen from a different point of view. The crucial difference is that 9(b) is no longer an image taken with the camera from a different position, but a different view of the sythetic object onto which the texture extracted from the image has been mapped.

Figure 9: (a) Camera image with superposed wireframe. (b) Synthetic object with texture mapping.

Figure 10(a) shows the Z-buffer mapping in progress on the end panel of the box, and 10(b) the two triangular textures extracted from the image of the box.

Figure 10: (a) Texture mapping interruptus. (b) Triangular end panel texture patches extraced from the image of the box.

Figure 11 shows an image mapped onto the reef model of fig. 3.

Figure 11: The 3-D model of a portion of the artificial reef is shown superposed onto the image.

Conclusion

The project is new, we have just completed the design-specification stage in consultation with the marine biologist end-users, and we are beginning implementation of the interface and the navigational package. The low-level system will be based on RenderWare from Criterion Software [18], onto which we are currently building a more easily-useable object-oriented C++ interface [16].

In parallel, and as presented in this paper, we are working on the problem of accurately creating the virtual world by adapting to this new problem technology which we have already developed for 3-D reconstruction applied to mobile robotics [4,3,2].

While user-driven for the underwater environment, we expect that techniques developed will be general enough to be applied to other problem domains where a need for geometric / photometric data-fusion exists, architectural systems and digital terrain mapping to name but two.

Acknowlegements

The authors wish to thank particularly Jean de Vaugelas of AquaScience for inspiring the project and for videotaping the underwater reefs. David Luquet of AquaScience is responsible for the excellent quality still photographs. Thanks also to AMPN for permission to dive in the Underwater Reserve of Monaco.

References

1: Minoru Asada, Masahiro Kimura, Yasuhiro Taniguchi, and Yoshiaki Shirai. Dynamic integration of height maps into 3d world representation from range image sequences. International Journal of Computer Vision, vol. 9:1, pages 31--53, 1992.
2: M. Buffa. Navigation d'un Robot Mobile a l'Aide de la Stereovision et de la Triangulation de Delaunay. Ph.d thesis, Universite de Nice, June 1993.
3: M. Buffa, O.D. Faugeras, and Z. Zhang. A complete navigation system for a mobile robot, using real-time stereovision and the delaunay triangulation. In Proceedings of the MVA Workshop, pages 191--194, Tokyo, Japan, December 1992.
4: M. Buffa, O.D. Faugeras, and Z. Zhang. Obstacle avoidance and trajectory planning for an indoor mobile robot using stereo vision and delaunay triangulation. In I. Masaki, editor, Vision-based Vehicle Guidance, chapter 13, pages 268--283. Springer, New York, 1992.
5: M. Buffa, L. Robert, and M. Hebert. Weakly-calibrated stereo perception for rover navigation. In Proceedings of the IEEE ICCV 1995 conference (to appear), 1995.
6: Eugčne Debernardi. Design and construction of artificial reefs. Memorandum for the Fourth International Conference on Artificial Habitats for Fisheries, November 1987.
7: Rachid Deriche and Gerard Giraudon. A computational approach for corner and vertex detection. International Journal of Computer Vision, vol. 10:2, pages 101--124, 1993.
8: Frank Diard, Michel Buffa, and Peter Sander. Texture-mapping based on the z-buffer. Technical report, Laboratoire I3S---CNRS, B.P. 145, 06903 Sophia-Antipolis cedex, France, 1995. In preparation.
9: O.D. Faugeras. Three-Dimensional Computer Vision: a Geometric Viewpoint. MIT Press, 1993.
10: O.D. Faugeras, Q.T. Luong, and S.J Maybank. Camera self-calibration. theory and experiments. In Proceedings of the European Conference on Computer Vision, pages 321--334, Santa Margherita Ligure, Italy, May 1992.
11: O.D. Faugeras and G. Toscani. The calibration problem for stereo. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, pages 15--20, Miami Beach, Florida, 1986.
12: Pascal Fua and Yvan G. Leclerc. Object-centered surface reconstruction: Combining multi-image stereo and shading. International Journal of Computer Vision. In the press.
13: Donald B. Gennery. Visual tracking of known three-dimensional objects. International Journal of Computer Vision, vol. 7:32, pages 243--270, 1992.
14: Richard I. Hartley. Self-calibration from multiple views with a rotating camera. In Proceedings of ECCV 94, pages 471--478, 1994.
15: David G. Lowe. Robust model-based motion tracking through the integration of search and estimation. International Journal of Computer Vision, vol. 8:2, pages 113--122, 1992.
16: Mats Persson. Design and implementation of the virtual diver. Master's thesis, Department of Computer and Information Science, Linköping University, S-581 83, Linköping, Sweden, 12-29 1994.
17: W.R. Pickering. Merging 3-d graphics and imaging---applications and issues. Computer Graphics (Siggraph'93), pages 395--396, August 1993.
18: Criterion Software. Renderware API Reference Manual V1.3. 1994.
19: Richard Szelisky. Image mosaicing for tele-reality applications. Technical Report CRL 94/2, Cambridge Research Laboratory, Digital Equipment Corp., May 1994.
20: J. Weng, Thomas S. Huang, and N. Ahuja. Motion and Structure from Image Sequences. Springer-Verlag, Berlin, 1993.
21: Z. Zhang and O.D. Faugeras. Building a 3d world model with a mobile robot: 3d line segment representation and integration. In In Proceedings of the 10th IEEE International Conference on Pattern Recognition, Atlanta City, New Jersey, June 1990.

Camera Calibration Details

We can only provide a brief overview of the essential camera calibration step; the interested reader is refered to [11,8]

In homogeneous coordinates, we write for a 3-D point, for a 2-D point. Projecting from 3-D space to 2-D space using the standard pin-hole camera model of image formation gives the relation

The relation between the projected point and the image point is given by the simple affine relation

Combining eqns. (1,2) gives the relation between 3-D point M and image point m

The parameters represent the intrinsic parameters of the camera

The camera is positioned at in world coordinates and oriented with its axes given in world coordinates by . Thus

is the mapping from 3-D world to 3-D camera coordinates. The extrinsic camera parameters are the camera position and axes.

Combining eqs. (3,4) gives the complete camera model for image formation

It remains to determine the matrix . Let w=1 for the homogeneous coordinates of M so that

Expanding eq. e:H matrix and substituting the Cartesian form of the image coordinates, i.e., yields the two equations

in 12 unknowns . It is thus necessary to determine correspondences between 6 points in the image and on the calibration target, and then to solve for the unknowns by standard numerical methods. It is then reasonably straightforward to recover the camera parameters from the matrix [11].

About this document ...

The Virtual Diver, an Architectural ``Swim-Around'' System Incorporating Real Imagery

The command line arguments were:
latex2html -split 0 CES95.tex.

The translation was initiated by Sander Peter on Tue Apr 25 20:03:32 MET DST 1995

...data

Generally, the performance of computer vision systems is demonstrated only on such data!

Sander Peter
Tue Apr 25 20:03:32 MET DST 1995