Supervised and Unsupervised Learning

In this first post, we will address the most basic forms of learning techniques. For all the upcoming posts, the goal of learning is to find an appropriate function f:\mathcal{X}\rightarrow \mathcal{Y} which maps some input x_i in space \mathcal{X} to y_i in the label space \mathcal{Y}. Here, input space is the data (images, audio signals, etc.) or more typically some interesting features (SIFT, MFCCs, etc.) and label space is the list of possible classes. In person identification this would be the list of people names, while in binary classification like face or not-a-face we just have 2 possible labels.

In Supervised learning the machine is provided with many examples x_i along with the class they belong to y_i. It is among the simplest, most studied forms of learning, and there are many popular classifiers which can be trained. However, it is also the most prohibitive to scale up, since each new concept needs to be taught separately with many examples. A lot of manual effort is required in obtaining the class labels y_i. Typical supervised learning techniques include Nearest Neighbour classifiers, Support Vector Machines (max-margin classifiers), Random Forests (decision trees), Linear Regression (and other regressors), etc. A common problem is over-fitting to the training data, and is reduced by having a validation set (some data kept aside for tuning parameters).

On the opposite side is Unsupervised learning where the data comes with no labels at all. The goal here is to automatically learn some underlying structure in the data. Most work is on grouping similar data (in feature space \mathcal{X}) into clusters. The most popular method is K-Means where the data is sorted into “K” clusters. The other common technique is Agglomerative clustering which has the advantage of not requiring to specify the number “K”. However, it does need some other stoppage criteria like distance between clusters which are to be merged, or the tree depth. (See: Clustering and Matlab)

The evaluation of clustering is quite tricky, and one interesting way to do it is by checking manual effort required to label the data (Read: Unsupervised Metric Learning for Face Identification in TV Video, Cinbis et al. ICCV2011). Other methods in some way capture the entropy of clusters. Adjusted Rand index also seems to be a good way to measure clustering accuracy (but Rand index alone is bad!).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s