The concept of CT is extremely simple. Used on grayscale images, consider any pixel and its 8 neighbours. Just assign boolean values 0/1 to the pixels who have a value lower/higher than the center respectively. Scan them in row order and this generates an 8-bit stream for each pixel and is the new transform value at that pixel. The central pixel is ignored.
On the other hand, the MCT makes a small change in this by saying compare with the mean of the 3×3 block rather than the center value. It can now also use the central pixel for comparison. A similar operation provides 9 bits which are the transform value for that pixel. A different block size like 5×5 can be used too for both, however 3×3 is usually the more favoured one.
An example figure from the paper – Face Detection with Modified Census Transform is shown here and clearly shows that the vast change in global illumination or gradient doesn’t affect the local pattern much.