Their main idea is quite a simple one. First, the problem is converted to a graph. An example would be to associate each random variable or observation with a node, and join edges of those nodes which seem to be related with each other. Then, cliques of this graph are considered together. The maximal cliques in the graph (taking into account each node in atleast one of them) are associated with a strictly positive potential function. The joint probability of the model is then just a normalised product of the individual clique potential functions.
Now, since we require positive potential functions, a typical example is to use the exponential function. They are used as where E is the “energy function”. It is easy to see that the joint probability (a product of individual potentials ) is now the exponential of a sum of energies; and following the entropy concept, we have the lower energies associated with higher probabilities.
These energy functions are chosen for each clique such that the desired response has lower energy, and the undesired has higher energy. Typically they could be functions of the random variables involved in that clique. They also include “constant” parameters (weights) so that more important cliques can exert dominance on the total probability of the model. These parameters then are learnt.
MRFs seem to be a really nice way of combining multiple observations or pattern recognition tasks intended towards a certain goal. The probabilistic approach followed here can be expected to beat the heuristics that would otherwise be employed. More on algorithms for training and testing in posts to come!
The book on Pattern Recognition and Machine Learning by Bishop, provides a brief idea about Graphical Models in Chapter 8 and is provided as the sample chapter.