Person Tracking on Mobile Robots

Tracking a particular person among multiple persons is a fundamental, yet challenging task for a mobile robot that interacts with humans. Various features of the person of interest such as his/her face or clothing need to be learned online and recognized in real-time. We present methods our home-assistant robot uses to robustly track a person.

Robust Real-Time Face Recognition

Real-time face detector

The face of a person is a feature that uniquely identifies that person. Given an image the robot captures with its webcam, we use Viola and Jones' real-time face detection algorithm to draw a bounding box around a person's face if present. Once the face is located in the image, we want to apply a real-time face recognition algorithm to identify the face, and thus the person himself/herself. Although there have been many promising face recognition methods, most proposed face recognition methods are not accurate enough for our problem (which includes a moving camera and/or face and variable illumination conditions) or are computationally too expensive to run in real-time.

Extracted SIFT features

To be invariant to face misalignment due to pose variations, we extract scale-invariant feature transform (SIFT) features from the face bounding boxes. SIFT is able to locate points in an image that are scale and rotation invariant. Such affine-invariant descriptors have shown robust matching capability across a substantial range of affine distortion, change in 3D view point, addition to noise, and change of illumination.

Matched SIFT features between same and different people

For the identification of a face, we compare the SIFT features in the test image to that in the training image. The images on the right show examples of matching SIFT features between images of the same person (left) and different people (right).

Heterogeneous Inter-Classifier Feedback

Concept of inter-classifier feedback
(click to enlarge)

An overall classifier is built up from two or more heterogeneous sub-classifiers. We divide the characteristics into two groups: primary and secondary. The primary characteristic must be unique, but its classification may be computationally expensive, or susceptible to noisy input data. The secondary characteristic that may be ambiguous, but computationally less expensive and more robust with respect to noise, can be introduced to leverage the shortcomings of a classification solely based on primary characteristics and act as a fall-back classifier. By dividing a common classification problem into multiple classification problems, we enable inter-characteristic feedback to each other which can be used to improve the performance of the overall classifier in both accuracy and speed.

Person tracking with inter-classifier feedback
(click to enlarge)

Tracking a person solely by his/her face is not robust enough for our assistant robot, because the face may not be always detected in the video streaming image. Thus, we track a person by his/her face (primary characteristic) and his/her shirt color (secondary characteristic). With inter-classifier feedback, the performance of the overall classifier is improved in terms of both accuracy and speed. The face recognizer is more robust to color changes caused by ambient brightness changes, and can suggest that the shirt classifier re-train itself under the new lighting condition. Since SIFT features are sensitive to directed lighting, the shirt classifier can suggest that the face recognizer add the misclassified face as additional training data. The face detector can be skipped every other frame to improve frame rate without hurting the person tracking because it is backed up by the shirt classifier.

The person-detection method described above was a key component of our RoboCup@Home 2007 competition entry. A quicktime video of our presentation in the finals can be watched here.