Shader-Based Stereo Matching with Local Algorithms

Master Thesis in Computer Science by Karl Jonsson during 2002-2003, presented 2003-09-15
Menthor: Lennart Ohlsson, Department of Computer Science, Lund University

Download

Documentation
Abstract (pdf)
Report (pdf)

Programs (zip)
VirtualSource
D3DFilter
StereoRenderer

Movie clips (zip)
stereo
disparity
fog

Short presentation

Modern graphics cards have created new possibilities within the area of image processing. They are much more efficient when it comes to working with graphical operations such as transformations and lighting. Currently they are considered to be about 200 times faster on these calculations than the central processing unit CPU. Furthermore the evolution of the graphical processing unit GPU is faster than the evolution of the CPU:s. One of the latest innovations is the integration of the programmable Shaders within the graphics pipeline on the graphics cards. These Shaders increase the flexibility when programming effects primarily to be used in computer games, but at the same time create opportunity for other uses. Within computer vision and image processing often great computer capacity is required, since calculations involving images are very demanding. Applications on images are also performed in similar ways within computer graphics. It has been shown that simple operations from image processing can be carried out on the GPU. The purpose of this master thesis was to examine if graphics hardware is suited for more complex image processing and computer vision. The disadvantage is that not all operations can be efficiently implemented. Stereo matching was chosen as subject for the examination. The objective of the matching is to find the connection (disparity) between two different images taken of the same scene

taken of the same scene at different angles. From this connection the scene can be reconstructed. This is basically the same thing that happens inside our brain when images from our eyes are combined to give us depth sight. Only dense local two-frame algorithms that operate on each pixel of the images were examined. Since dense algorithms have not previously been examined throughout fully, depending on the heavy calculations involved, it proved hard to find any really good algorithms. The implemented algorithms were applied on single stereo pairs as well as streaming parallel stereo. Fast efficient implementations are necessary, when matching streaming video. It is also easy to find applications for matching on streams, for example simply by attaching two cameras to the computer. Since the programming was carried out in the Windows environment the DirectX interface was chosen. The Direct3D subset was used for the image processing and the DirectShow subset was used for handling the streaming media. By creating separate filter components in DirectShow the flexibility of the application was increased. This made it possible to choose different sources of stereo media and to write the resulting output to either the screen or a file in the hard drive. The final results clearly indicated that there are advantages to utilize the graphics hardware for image processing.

Execution time of implementations

The graph below shows the execution time as a function of image size of two different implementations of the same matching algorithm. The lower line is the algorithm implemented with Shaders while the upper line is the classical implementation.

Both implementations are written in C++ for efficiency and so that they can be compared. The graph clearly indicates that GPU implementations of local matching algorithms can carried out efficiently on the graphics hardware.