Face Detection using adaBoost

2013-08-20 Machine Learning, AdaBoost, Facial Recognition

One of the main tasks in machine learning is detection. A typical detection algorithm consists of three steps. It starts by taking in the data D. It then transforms D to a feature space F which will allow the computer to make decisions, and then applies a rule R to the features to decide if an event is detected or not. This is much easier said than done and is very application specific. I will motivate this by a short example, to increase appreciation for adaBoost.

Let's take face detection, for example. Given any image, we are to supply the computer with a set of features that will allow finding the face. Deciding on them can come in tandem with the detection method. For simplicity, let's take a Support Vector Machine (a very effective algorithm that maps the data into infinite dimensional space in order to find a separating hyperplane between the classes (this makes intuitive sense if you want to think of a knot- if you had another dimension in which to pull it, it would be undone)). Because you can't make any assumption as to the content of the image based on what you see elsewhere, you must take as much of the image into consideration. SVMs can then find the separating hyperplane. The problem is that when you later use the SVM to evaluate a new picture after it has learned where the hyperplane is, you need to recompute all those features for the image. Furthermore, while the SVM is known to give a high detection rate, there is not much you can do about false positives (marking something that isn't as though it is a face). In comes adaBoost!

Ideally, a detector should find all targets and ignore all other "things" (this is called minimizing the false positive rate). Say you had a set of detectors that can detect targets based on a few simple criterion, each of which have some trouble separating between targets and non-targets, essentially marking too many "things" as targets. Well, think of this as a series of checkpoints where each checkpoint detects 99% of the targets but marks incorrectly 30% of the non-targets. Say you had 10 of those in a row, this cascade will ultimately give you a detection rate of 90% with a false positive rate of 6 x 10^(-6) ! How? Very simply! The first step lets through 99% of the targets and 30% of the non-targets. The next step, having similar capabilities, again finds 99% of the targets, thus lets through (.99)^2 = 98% of the original targets, and similarly lets through 30% of the original 30% false targets, etc. This is a brilliant idea, and the underpinning of the adaBoost algorithm. Two questions must be answered: what for features can be used to make such a scheme possible, and what type of classifiers can be used?

Regarding the features. It should be clear that the larger the set of potential features, the higher the odds that good ones will be found to find targets. This means that not only should the algorithm find a set of good classifiers for the features, but it should also select the appropriate features. This is the beauty of adaBoost- it does this all in one step! It lets the features do all the work by defining the simplest possible classifier and then finding the feature for which the classification error is minimized. The exact implementation details can be found in [1], the Viola Jones paper applying adaBoost to face detection. The classifier is simply learning a threshold value above which an object is marked as a target. The feature whose detection error is smallest is chosen and its threshold is lowered until the desired detection and false positive rates are achieved. This feature is removed from consideration and the procedure is repeated on the rest of the features until the overall detection and false positive rates are matched. In this process, the small set of features is noted. This implies that when testing a new image, all that needs to be done is evaluate those, say, 10 features, and get pretty accurately whether there is a face present.

The simplicity of this algorithm is simply beautiful. Now, to make this work, a lot of features and training examples are required. Consequently, training time may be high, but this testing method is so effective that it is the de facto method for facial detection today (facebook and your smartphone use this).

References
[1] Rapid Object Detection using a Boosted Cascade of Simple Features, P. Viola, M. Jones, 2001.

Company activity:

Communication :

  • Communication by satellite:
    mobile and fixed
  • Communication by VoIP
  • Internet infrastructure and lead generation
  • Internet search engine optimization (SEO)

Others :

  • Real estate
  • Green energy

Operation area :

  • America
  • W. Europe
  • Africa