Polyhedral Visual Hulls
- Recovering 3D structure from 2D images
|
|
This page describes a system I built as part of my Master's thesis.
The system reconstructs 3D models of objects from a limited set of
2D images of the object - an area in computer graphics/vision that has
particular applicability to virtual environments, gaming and
cinematic media. It is hoped that the system will be developed futher
integrating it with current motion-capture systems to supply surface
models of subjects. This would eliminate a substantial amount of post-
process model fitting that is currently necessary.
|
| |
|
The system uses the 'shape from silhouette' technique to recover 3D
structure. This requires that we have some arrangement of
(calibrated) cameras imaging the scene:
|
| |
|
| |
|
In the above setup, the cameras are raised up by 20° to the
horizontal. This produces the following images:
|
| |
|
| |
|
First the silhouette of the object as seen from each camera is
extracted using a technique called marching squares. The silhouette
contour consists of a lot of points which means lots of calculations
later so it is simplified first using a technique called decimation.
Decimation only works on sets of triangles, so a technique called
Delaunay triangulation has to be applied first to turn the contour
into triangles. The triangle set can then be decimated.
|
| |
|
| |
|
A line is drawn from each of the camera centres through each of the
points on the silhouette contour. The lines form the edges of a cone-
like structure called a silhouette cone...
|
| |
|
| |
|
All that needs to be done now is to find the intersection of all the
silhouette cones - this will form the closest possible approximation
to the original object's surface. This is called the 'visual hull'.
|
| |
|
Other similar systems perform this intersection by dividing the
volume of interest up into small cubes called voxels. They then go
through each cone in turn and work out which cubes lie outside the
cone. In this way a volume of cubes is cut away. This is called
'space carving'. This is an efficient method, but it's not
particularly accurate. The approach taken here uses a geometric
property called the 'epipolar constraint' to reduce the intersection
calculation into 2D which is much less complicated than trying to do
it in 3D. The useful property of epipolar geometry is that one point
seen in 2D on one camera's image plane projects to a line on another
camera's image plane. Have a look at the diagram above again and
consider a second camera's silhouetted view of the object. You can
imagine that if the image plane in the diagram was large enough, the
second camera's centre and silhouette cone would be visible in it. It
would show as a flattened cone or pencil of lines on the image plane
and would actually overlay some or all of the original silhouette.
Each camera will project as a different pencil of lines with the
point of the pencil as that camera's centre. Knowing the focal length
and central pixel point of each camera together with the relative
positions of each camera is enought to describe how each pencil will
project onto an image plane. The diagram below shows how the
silhouette cones of the other 3 cameras project onto the image plane
of camera 3.
|
| |
|
| |
|
Now consider how the silhouette cone from camera 3 projects onto the
other 3 camera image planes as shown below. That one cone now crosses
the other 3 silhouettes as shown. Each triangular cone face
intersects an area in each of the other image planes shown as red
polygons.
|
| |
|
| |
|
So now one cone face from camera 3 has an intersection polygon
in each of cameras 1, 2 and 4. If we now project lines through the
polygon's vertices in each of cameras 1, 2 and 4 back to where they
intersect with silhouette cone 3 in 3D we have 3 more or less
overlapping polygons. Calculating the intersection of these polygons
gives us a face in the final 3D polyhedral visual hull. Do this for
all the cone faces in camera 3 and we have a set of polygons in 3D
which represent the part of the original object which lies on
silhouette cone 3. Repeat this for all 4 cameras and we end up with
the final complete structure.
|
| |
See my thesis report
for a detailed discussion. Click on the output image below for a
short video tour around the reconstructed object. Each of the
silhouette cone's from the 4 cameras polygons are coloured
differently. The left video is 2.8MB, the right is 1.3MB.
|
| |
|