Computing/Thesis
 

Polyhedral Visual Hulls

- Recovering 3D structure from 2D images

This page describes a system I built as part of my Master's thesis. The system reconstructs 3D models of objects from a limited set of 2D images of the object - an area in computer graphics/vision that has particular applicability to virtual environments, gaming and cinematic media. It is hoped that the system will be developed futher integrating it with current motion-capture systems to supply surface models of subjects. This would eliminate a substantial amount of post- process model fitting that is currently necessary.
 
The system uses the 'shape from silhouette' technique to recover 3D structure. This requires that we have some arrangement of (calibrated) cameras imaging the scene:
 
 
In the above setup, the cameras are raised up by 20° to the horizontal. This produces the following images:
 
cube/sphere view 2 cube/sphere view 4 cube/sphere view 3 cube/sphere view 1
 
First the silhouette of the object as seen from each camera is extracted using a technique called marching squares. The silhouette contour consists of a lot of points which means lots of calculations later so it is simplified first using a technique called decimation. Decimation only works on sets of triangles, so a technique called Delaunay triangulation has to be applied first to turn the contour into triangles. The triangle set can then be decimated.
 
contouring
 
A line is drawn from each of the camera centres through each of the points on the silhouette contour. The lines form the edges of a cone- like structure called a silhouette cone...
 
silhouette geometry
 
All that needs to be done now is to find the intersection of all the silhouette cones - this will form the closest possible approximation to the original object's surface. This is called the 'visual hull'.
 
Other similar systems perform this intersection by dividing the volume of interest up into small cubes called voxels. They then go through each cone in turn and work out which cubes lie outside the cone. In this way a volume of cubes is cut away. This is called 'space carving'. This is an efficient method, but it's not particularly accurate. The approach taken here uses a geometric property called the 'epipolar constraint' to reduce the intersection calculation into 2D which is much less complicated than trying to do it in 3D. The useful property of epipolar geometry is that one point seen in 2D on one camera's image plane projects to a line on another camera's image plane. Have a look at the diagram above again and consider a second camera's silhouetted view of the object. You can imagine that if the image plane in the diagram was large enough, the second camera's centre and silhouette cone would be visible in it. It would show as a flattened cone or pencil of lines on the image plane and would actually overlay some or all of the original silhouette. Each camera will project as a different pencil of lines with the point of the pencil as that camera's centre. Knowing the focal length and central pixel point of each camera together with the relative positions of each camera is enought to describe how each pencil will project onto an image plane. The diagram below shows how the silhouette cones of the other 3 cameras project onto the image plane of camera 3.
 
silhouette cones
 
Now consider how the silhouette cone from camera 3 projects onto the other 3 camera image planes as shown below. That one cone now crosses the other 3 silhouettes as shown. Each triangular cone face intersects an area in each of the other image planes shown as red polygons.
 
intersection polygons
 
So now one cone face from camera 3 has an intersection polygon in each of cameras 1, 2 and 4. If we now project lines through the polygon's vertices in each of cameras 1, 2 and 4 back to where they intersect with silhouette cone 3 in 3D we have 3 more or less overlapping polygons. Calculating the intersection of these polygons gives us a face in the final 3D polyhedral visual hull. Do this for all the cone faces in camera 3 and we have a set of polygons in 3D which represent the part of the original object which lies on silhouette cone 3. Repeat this for all 4 cameras and we end up with the final complete structure.
 
See my thesis report thesis pdf   thesis gzipped postscript   for a detailed discussion. Click on the output image below for a short video tour around the reconstructed object. Each of the silhouette cone's from the 4 cameras polygons are coloured differently. The left video is 2.8MB, the right is 1.3MB.
 
cube/sphere video (2.8MB) cube/sphere video (1.3MB)

Back to top