MP3 – Compression Strategy

Continuing from the previous post giving a little information about MP3 in general, we will now look at the actual compression methodology.

The time-domain samples, PCM input is fed to a Polyphase Filterbank which filters the incoming 1152 samples into 32 equally spaced frequency sub-bands. But this leads to a 32-fold increase in the number of samples, and hence decimation by 32, is carried out in each sub-band. Simultaneously a 1024-point and 256-point FFT is carried out to give good resolution and information on spectral change.

This FFT data is fed to the Psychoacoustic Model block. Algorithms in this block model the human sound perception. This provides information on the window to be used for the signals in 32 sub-bands. Normal, Start, Stop and Short are the Windows defined in the MPEG standard. Detection of dominant tonal components is carried out and critical band masking thresholds are calculated. These are the scalefactors bands for which the quantization noise is to be kept within limits.

Now, Windowing is carried out on each of the 32 sub-bands. A Modified DCT is applied to each time frame of the sub-band samples which are thus split into 18 finer sub-bands creating a granule with a total of 576 lines.

The Scaling and Non-uniform Quantization is now applied to these 576 spectral values at a time. This is done in two nested loops distortion control loop (outer) and rate control loop (inner).

Its enough to understand here that the rate control loop does the quantization of frequency domain samples and also determines the required quantization step-size. The quantization step is increased so as to keep the number of Huffman coded bits lower than the total available bits (fed as CBR requirement initially). On the other hand, the distortion control loop manages the quantization noise, and keeps it below the masking threshold computed for the scalefactors.

These quantized values, finally satisfying the required criteria are now Huffman coded. The Huffman code related side information is given by the loops and is stored in the side information of each frame. Finally the bit-stream formatting and CRC generation is carried out, and frames representing 1152 samples are put out.

Information about the decompression technique and more details can be found at this well-written paper: “The Theory Behind MP3” by Rassol Raissi.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s