Current systems based on clothing recognition with face recognition, usually draw a box below each located face, and call that the clothing area. Obviously, if no face is detected, there is no clothing box too, defeating the whole purpose of locating and identifying people when the face is not visible. To improve on this, we first perform person detection and tracking using poselets.
Another challenge we must tackle is that clothing changes over time in an episode. We thus divide the entire episode into meaningful scenes, during which the actor usually does not change his/her clothing.
But the best part of our system is, we do not need to explicitly label the person id corresponding to the clothing. The first step involves clustering images involving people with similar clothing. We do this using the agglomerative hierarchical clustering (using Matlab) technique. Further, in person detections which contain detected and recognized faces (some of them do) we use the Face Recognition results to label clusters with that particular ID. Not all clusters are labeled, and those which are labeled need not be always correct. For the unassigned clusters, each image in the cluster is compared with the ID assigned cluster and we associate it with the ID corresponding to most similar clothing.
The fascinating part of this process is that, the accuracy of clothing-based identification is slightly better than the face recognition, which was used to assign IDs in the first place! The conclusion being that the clustering has done a great job.
Sample images of the method… (click to enlarge, and read caption for clarification)
If the above information is useful to you, please cite
M. Tapaswi, M. Bäuml, and R. Stiefelhagen, “Knock! Knock! Who is it?” Probabilistic Person Identification in TV Series, CVPR 2012. (project page)