JPEG – Compression Method

JPEG are the most famous and widely used image file format. Going into a little of history, in 1994, it was approved by the ISO and given the number ISO/IEC 10918-1. It is also available as the ITU – T.81 standard. This JPEG standard defines both the methods followed in compression and decompression (codec) and the filestream (header + data).

The JPEG image is essentially lossy, i.e. one cycle of compression followed by decompression will NOT yield an image, which is an EXACT replica of the original. Although there are lossless versions, progressive versions, we will restrict this discussion to the general variant.

The compression algorithm is as follows:

Colour Space Transform: The RGB image is converted to the YCbCr image as discussed in the previous post and then the chrominance components are down-sampled. The generally used method is the 4:2:0 again discussed earlier. All further processing is done on the Y, Cb and Cr layers independently.

Block Splitting: Block splitting is one of the processes that splice the image into smaller blocks of 8×8 or a 16×16 dimension. The important thing to understand is why this is done.

Assume that we have a 256×256 image. Split these into 1024 8×8 blocks. Now, knowing that the computation complexity for the 2-dimensional DCT operation is O(N4) and O((N log2N)2) considering the Fast Cosine Transform, we still have
2564 >> (256xlog2256)2 >> 1024 x (8xlog28 )2
which proves that its advisable to split and perform the operations.

But there is an inherent problem with this block splitting which creates artifacts in the reconstructed image. See example. Hence one needs to have a certain trade-off while choosing the size of blocks. In case the image is not a perfect multiple, zero-padding is carried out.

Mean Shifting: For averaging the image pixels to 0, we mean shift by subtracting 128 from every pixel.

Discrete Cosine Transform: The 2-dimensional Type-II DCT Transform is carried out on these blocks of each layer. The DCT has a property called Energy Compaction which makes most of the values close to 0.

Quantization: This step is the reason for the LOSSY compression that is achieved in JPEG. Each of these DCT blocks are now Quantized, (i.e. divided by) in accordance with Quantization tables specified by the JPEG themselves. They use different tables for Luminance and Chrominance quantization, ascertaining that the human eye is more sensitive to Luma than the Chroma.

Entropy Coding: The quantized values are scanned in a zig-zag fashion (starting from top-left corner) and are then Huffman coded till an all-zero sequence arrives which is ended by a special code word called EOB. An interesting part to note here is the Progressive and Sequential methods of coding.

Sequential methods code one particular quantized and zig-zag scanned block, while the Progressive mode, tries to group and code the first, second, third, etc. values of the blocks together.

Decoding: The decoding sequence is the exact reversal of the above procedure. Note that, after the de-quantization step original DCT values close to 0 now remain at 0 and this is the lossy part.

Information on how the bit-stream seems to be boring to write for now 🙂 Please look it up on the Wiki page or the ITU T.81 original standard document. Please post in comments in case clarifications are required.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s