The C++ Augmented Reality Toolkit

4. Feature Extraction

Feature extraction is a term used for finding known elements within an image. These elements consist mainly of corners, lines, shapes and curves. The ARLib uses two dimensional square markers for its detection system, therefore the features required were corners and edges.

>4.1 Detecting the Corners

There is a variety of corner detection algorithms documented in many textbooks and research papers. A commonly used method is the Harris Corner Detector [HARRIS and STEPHENS 1988]. The basic mechanism behind the Harris detector is that a small square template window is centred on a point in the image. The template window is then slightly shifted in various directions and the average change in intensity is recorded. It is these changes in intensity that can indicate whether the template window is on a corner, see figure 4.1.

Figure 4.1</br>The Harris Corner Detector

Although deemed to be the preferred method for detecting corners, it was estimated that the algorithm would take too much computation time and would hinder the real-time target of ARLib.

Another common technique for detecting corners is known as the SUSAN Corner Detector [SMITH and BRADY 1997]. This technique uses a circular template mask which is placed over every pixel in the image. All pixels of a similar intensity to the central pixel are counted, and if the count is less than half of the number of pixels in the template, then it is assumed a corner exists within the circle. The actual location of the corner is indicated by performing a local minimum search within the template area, see figure 4.2.

Figure 4.2<br/>The Susan Corner Detector

Initial prototyping and testing showed that the SUSAN algorithm was too slow at processing to be used in a real-time application.

Another corner detection algorithm that uses a circular template is detailed in [EPFL 2004] and is referred to as the Keypoint Detector. This technique differs from the SUSAN method in that only the pixels on the circumference of the template are analysed. The idea behind this technique is that if two diametrically opposed pixels have a similar intensity to the central pixel, then the point is not a corner or feature point, see figure 4.3.

Figure 4.3<br/>The Keypoint Detector

Because of the way video images get digitised, some straight edges can be falsely recognised as corners. To overcome this problem, the neighbouring cells on the circumference are also tested, see figure 4.4.

Figure 4.4<br/>Testing Neighbouring Pixels in Keypoint Detector

This algorithm was initially implemented in ARLib as it performs well and has no real noticeable affect on performance. Figure 4.5 shows some sample video images with highlighted corners. It can be seen that there appeared to be more corners indicated than were actually visible. This was due to noise caused by the segmentation process.

Figure 4.5<br/>Sample Images showing Keypoint Detector results.

4.2 Detecting the Marker Edges

When a two dimensional square is viewed from any angle, its edges always remain straight, therefore if the four edges of the square can be detected, then its corners can also be calculated using simple line intersection routines.

Since its invention in 1962 by Paul Hough, the Hough transform method [DUDA and HART 1972] has been the main technique used for detecting lines within an image. The technique has been further advanced to detect arbitrary shapes and is called the Generalised Hough transform.

The principle behind the Hough transform is that there are an infinite number of lines that can pass through a single point. However, if many collinear points are examined, then there is only one line that is common to all of them, see figure 4.6. It is this simple concept that is used by the Hough transform.

Figure 4.6<br/>The Hough transform principle

The original method for calculating the lines in the Hough transform used the slope-intercept equation of:

Equation 4.1

Here a range of values are used for m and c to determine the lines for a particular point (x,y). A major problem with using the slope-intercept technique is that there is no way of knowing the range of values to use for m and c.

Another technique was tried [DAVIS 2005] that replaced the slope-intercept formula with one called the normal(θ,p) form:

Equation 4.2

Using this formula, the lines were represented using a sinusoidal wave. Points were plotted and multiple hits in (θ, p) space indicated the presence of a line. Multiple hits are highlighted in figure 4.7.

Figure 4.7<br/>Hough Transform Sinusoidal Wave.

The Hough transform is a very accurate technique for detecting lines within an image and functions well in noisy images too. It is, however, very computationally expensive and it is this reason that the Hough transform was not used in ARLib, as it cannot perform satisfactorily in a real-time scenario.

With the combination of these two problems, ie; the Hough transform not performing well and the corner detection not picking up all corners, another more efficient and reliable technique was needed.

In [MALIK 2002], a completely different approach is used to detect the corners and edges of the markers. The technique takes advantage of the fact that a marker comprises of a thick black border that is surrounded by a white edge, see figure 4.8.

Figure 4.8<br/>Marker Border

It can be seen that a potential marker’s boundary consists of a large connected region of black. To identify these regions, bounding boxes were created around the black border.

Looking for the black regions was a fairly simple process and the well known flood-fill technique [FOLEY et al 1990] was used. During the flood-fill process, the current bounding box was adjusted accordingly, see figure 4.9.

Figure 4.9<br/>Marker Bounding Box

The process is described in the following steps:

1. Starting at the bottom-left pixel of the image, each row is scanned until a black pixel is found.

2. When a black pixel is found, it is flagged as being processed and a bounding box is created around it.

3. The four unprocessed neighbouring pixels (above, below, left and right) are recursively scanned and processed and the extents of the bounding box are adjusted accordingly.

4. When no more black pixels can be found during the recursive process, the current bounding box is added to the list of regions that may potentially contain a marker.

5. The scanning of rows continues until another unprocessed black pixel if found, then steps 2 to 4 are repeated.

Figure 4.10<br/>Example bounding boxes

Because there is noise in an image, the number of bounding boxes could potentially be large, which would hinder the performance of the detection process. The problem was easily overcome by applying some simple checks prior to each bounding box being added to the current list:

1. If the bounding boxes area is smaller than a predefined value then it is deemed too small for processing and is excluded, see figure 4.11a.

2. If the aspect ratio of the box is below a defined threshold, then the assumption is made that either the region does not contain a square or, if it did contain a square, then it is being viewed from too sharp an angle for a satisfactory detection, see figure 4.11b.

Figure 4.11<br/>Bounding Box Exclusions. (a) Area and (b) Aspect Ratio too small.

Within each bounding box, a convex hull exists defined by the maximum extents of the black pixels. Because the markers were square, then the convex hull should contain four corners, see figure 4.12.

Figure 4.12<br/>Bounding Box and Convex Hull can help identify corners.

In [MALIK 2002], the assumption is made that a straight edge consists of approximately 50% black and 50% white pixels, while the corner has more than 50% white. To determine if an edge is a corner, a square template is placed centrally over the edge pixel. A count is then made of the black and white pixels and the four highest scoring counts are recorded. This entire corner detection process is included in the recursive flood fill routines.

The results from the corner detection process proved successful for most cases in ARLib but, like many corner detection routines, they performed best when the image being processed contained a marker that was relatively flat within the scene. Testing of this technique revealed that the corners were not detected when the marker was at an angle to the camera. This is because, in some instances, a corner can look almost like a straight edge, see figure 4.13.

Figure 4.13<br/>Corners appearing as a straight edge.

Further research failed to reveal an alternative corner technique that was capable of overcoming this problem therefore a custom corner detection method was devised and implemented.

Custom Corder Detector

A technique was needed for ARLib that would reduce or even take away the assumptions of other corner detection routines, in which a corner was determined by the ratio of black to white pixels. Because the bounding box and outer edges of the black region within the box are known, a geometric solution was possible.

If a bounding box was scaled so that it forms a square, then this simplified the problem even further, see figure 4.14.

Figure 4.14<br/>Scaling bounding boxes to form a square.

Once the bounding box was a square, the region corners could be identified by working out the furthest edge pixel from each of the bounding box corners, see figure 4.15.

Figure 4.15<br/>Detecting region corners from bounding boxes.

Although this technique succeeded where previous detectors had failed i.e. it detected almost straight corners, this technique also had a problem with one particular scenario. The corners were not detected correctly when the region within the bounding box formed a diamond shape. The problem was caused because two of the corners in a diamond shape were the same distance from two of the bounding box corners, therefore only two corners were detected, see figure 4.16.

Figure 4.16<br/>Two corners detected when diamond shape.

A simple solution to the problem was that, if four unique corners were not detected using the four corners of the bounding box, then four alternate points on the bounding box were used. In this case, the mid-point of each of the corners were used, see figure 4.17. By using both sets of bounding box reference points, it guaranteed that fours corners were always found.

Figure 4.17<br/>Alternate bounding box points.
<< 3. Image Segmentation 5. Marker Detection >>