Continuing from the previous post giving a little information about MP3 in general, we will now look at the actual compression methodology.

The time-domain samples, PCM input is fed to a **Polyphase Filterbank** which filters the incoming 1152 samples into 32 equally spaced frequency sub-bands. But this leads to a 32-fold increase in the number of samples, and hence decimation by 32, is carried out in each sub-band. Simultaneously a 1024-point and 256-point **FFT** is carried out to give good resolution and information on spectral change.

This FFT data is fed to the **Psychoacoustic Model **block. Algorithms in this block model the human sound perception. This provides information on the window to be used for the signals in 32 sub-bands. Normal, Start, Stop and Short are the Windows defined in the MPEG standard. Detection of dominant tonal components is carried out and critical band masking thresholds are calculated. These are the *scalefactors* bands for which the quantization noise is to be kept within limits.

Now, **Windowing** is carried out on each of the 32 sub-bands. A **Modified DCT** is applied to each time frame of the sub-band samples which are thus split into 18 finer sub-bands creating a *granule* with a total of 576 lines.

The **Scaling and Non-uniform Quantization** is now applied to these 576 spectral values at a time. This is done in two nested loops *distortion control loop* (outer) and *rate control loop* (inner).

Its enough to understand here that the rate control loop does the quantization of frequency domain samples and also determines the required quantization step-size. The quantization step is increased so as to keep the number of Huffman coded bits lower than the total available bits (fed as CBR requirement initially). On the other hand, the distortion control loop manages the quantization noise, and keeps it below the masking threshold computed for the scalefactors.

These quantized values, finally satisfying the required criteria are now Huffman coded. The Huffman code related side information is given by the loops and is stored in the *side information* of each frame. Finally the **bit-stream formatting** and **CRC generation** is carried out, and frames representing 1152 samples are put out.

Information about the decompression technique and more details can be found at this well-written paper: “The Theory Behind MP3” by Rassol Raissi.

### Like this:

Like Loading...

*Related*