# Supervised and Unsupervised Learning

In this first post, we will address the most basic forms of learning techniques. For all the upcoming posts, the goal of learning is to find an appropriate function $f:\mathcal{X}\rightarrow \mathcal{Y}$ which maps some input $x_i$ in space $\mathcal{X}$ to $y_i$ in the label space $\mathcal{Y}$. Here, input space is the data (images, audio signals, etc.) or more typically some interesting features (SIFT, MFCCs, etc.) and label space is the list of possible classes. In person identification this would be the list of people names, while in binary classification like face or not-a-face we just have 2 possible labels.

In Supervised learning the machine is provided with many examples $x_i$ along with the class they belong to $y_i$. It is among the simplest, most studied forms of learning, and there are many popular classifiers which can be trained. However, it is also the most prohibitive to scale up, since each new concept needs to be taught separately with many examples. A lot of manual effort is required in obtaining the class labels $y_i$. Typical supervised learning techniques include Nearest Neighbour classifiers, Support Vector Machines (max-margin classifiers), Random Forests (decision trees), Linear Regression (and other regressors), etc. A common problem is over-fitting to the training data, and is reduced by having a validation set (some data kept aside for tuning parameters).

On the opposite side is Unsupervised learning where the data comes with no labels at all. The goal here is to automatically learn some underlying structure in the data. Most work is on grouping similar data (in feature space $\mathcal{X}$) into clusters. The most popular method is K-Means where the data is sorted into “K” clusters. The other common technique is Agglomerative clustering which has the advantage of not requiring to specify the number “K”. However, it does need some other stoppage criteria like distance between clusters which are to be merged, or the tree depth. (See: Clustering and Matlab)

The evaluation of clustering is quite tricky, and one interesting way to do it is by checking manual effort required to label the data (Read: Unsupervised Metric Learning for Face Identification in TV Video, Cinbis et al. ICCV2011). Other methods in some way capture the entropy of clusters. Adjusted Rand index also seems to be a good way to measure clustering accuracy (but Rand index alone is bad!).