Probabilistic Fusion for Person Recognition

The second contribution of my Master thesis was to incorporate information from face recognition, clothing based recognition, and speaker recognition (as the three recognition modalities) in addition to constraints like two people cannot occur at the same time in a probabilistic manner. We used Markov Random Fields described almost a year ago when I was starting to learn what they are! 🙂

Graphical Model for Person Recognition (representation of TV series)

Graphical Model for Person Recognition

As is seen in the graph above, the TV series is first split into scenes, using the special sequences in TBBT, and then split into shots (abrupt camera / view changes). Within each shot, each person track is associated with a random variable vector “ID”, taking on values in the range of 0 to 1 depending on the supporting evidence. The node F provides face information (if it exists) and C provide clothing information to each person identity track. On the other hand, the speaker information is captured via a presence node. This says that at least one of the appearing people should be the speaker (most of the times, a true situation). It allows us to capture any presence information, (like transcripts too), and does not require us to perform speaker detection by analyzing the lip movement of faces.

Finally, within a shot, two identity nodes cannot vote for the same person! This, called the “uniqueness” potential is captured by the red lines joining each identity node to the other.

We model all the above links using energy functions which are left out here (refer to the paper/thesis for more details). The energy functions enforce the identity to move towards the weighted combination of the face and clothing counterparts, while satisfying the constraints imposed by the presence (speaker) and uniqueness. This is achieved using energy minimization. The various components can now be weighted differently, however in our experiments we found that the varying weights had insignificant impact. This is a good thing, since one can choose weights like all equal, or equal for modalities and double for penalties, and not experience drastic changes in performance.

Sample images of the method… (click to enlarge, and read caption for clarification)

If the above information is useful to you, please cite
M. Tapaswi, M. Bäuml, and R. Stiefelhagen, “Knock! Knock! Who is it?” Probabilistic Person Identification in TV Series, CVPR 2012. (project page)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s