Manifold Learning

Machine learning has faced a strange curse of dimensionality, where it becomes difficult to compare data points when they are really long vectors. This arises from the fact that most distances are normally computed in the Euclidean space and this gets quite complicated (see explanation of hypersphere/hypercube).

There are many dimensionality reduction techniques, the most famous being Principal Component Analysis which in fact works quite well when the data points are somewhat linearly distributed. However it produces disastrous results in case of non-linear distributions.

What is a manifold? The underlying idea is that in very large dimensional spaces, data points of physical processes usually tend to occur in restricted regions, and are almost never uniformly distributed through the entire space. A manifold is an imaginary surface on which these data-points are assumed to lie. Furthermore the key concept is that this manifold has very few degrees of freedom, and hence can be represented in a low-dimensional space.

The distance between the data points is now computed as the shortest distance while tracing along the manifold. For example, consider our Earth. To go from Delhi to New York you could drill through the ground and obtain the shortest distance in 3-D Euclidean space. However this is infeasible and the true shortest distance is actually the geodesic distance, the one when traveling on the surface of Earth. Here our Earth’s surface is our manifold in the large universe!

Given the data points, the objective is to find this manifold so that data-points close to each other on the manifold surface can be classified as belonging together instead of the ones which are close in Euclidean space. However finding this exact geometric shape is truly a Herculean task in most cases. Hence, the objective is transformed to find a function which can map the data from the high dimensional space to a simpler low dimensional space, such that the distance between the points in the low dimension corresponds to the geodesic distance as computed on the manifold.

Techniques such as ISOMAP and LLE developed in 2000 created quite a stir and were published in Science!! Here is a pretty interesting tutorial on manifolds, from which the above gist appears.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s