MP3 – All you wanted to know!

Today, we’ll have a look at the most famous of multimedia content – MP3 files. Firstly, MP3 is not MPEG-3 but is MPEG-1 Layer 3. Three such compression techniques were initiated by the MPEG (Moving Pictures Experts Group) for audio only, and were called the Layer 1, 2, and 3, increasing in complexity and performance. The Layer 3 achieves a whopping compression from 1.4Mbps to 128kbps (approx. 12:1) without any audible degradation and thus its popularity. This standard is given the number ISO/IEC – 11172-3 and was accepted in 1993.

The MP3 is a perceptual codec, by which we mean that it exploits the fact that our human ear is not sensitive to a lot of data (sounds) and also is affected by temporal and simultaneous or tonal masking. The human ear has 24 frequency bands. Now, when one tone in any band is above the masking threshold, its not possible for our ear to hear the other frequency components in that band. This is Tonal masking. On the other hand, Temporal masking occurs when two tones are played at the same time, and hence the stronger (in volume) tone dominates over the weaker one. This also causes a small duration of pre-masking (~50ms) and post-masking (~100ms) where we cannot hear softer sounds.

One important aspect that needs to be clear is that the MP3 generally functions at 128kbps but can function at other bit-rates too. This is an important specification in the encoding procedure. Also possible is the Variable Bit Rate (VBR) which assigns a specific bit rate to every frame, but this has drawbacks of possibility of incorrect timing display in the decoder / player and problems with broadcasting.

All MP3 files are divided into 1152 sample frames that last for 26ms. This gives a frame rate of 38 frames / second. Each frame consists a 4 byte header, could have a 16 byte CRC, side information which consists information to decode the main data. This is followed by the main data which has thresholds for the scalefactor bands which reduce the quantization noise, the Huffman coded bits, and an optional ancillary data.

Finally, this is appended by a 128-byte tag known as ID3 to enter textual information like ‘TAG’ (3), title (30), artist (30), album (30), year (4), comment (28), an all-zero byte (1), track number (1) and genre (1). The next step in this is the ID3v2 which is a very recent standard, and not yet commonly found. This is placed at the beginning of a file and has a dynamic size which thus does not impose restrictions on information size.

We shall continue with the actual compression strategy in the next post as this one has grown too long already! 😉


