There seems to be some confusion about the style of compression mp3 uses. It isn't based on repeating clips or any such time-domain analysis. All the work gets done in the frequency domain, in a sequence (1 per frame) of spectra.

My understanding of how it works is it splits the wave into its frequency components by a DCT (discrete cosine transform). A DCT is like an FFT (fast fourier transform - which gives you a spectral display) but with the nice property that there are no imaginary numbers floating about. It then takes into account a property of the ear: you cannot hear quiet frequencies which are close to loud ones. This effect is called 'psychoacoustic masking'. For these masked frequencies, it can use less accuracy. All the coefficients are then quantised (have bit accuracy reduced) depending on their required accuracy, and huffman coded. Quantisation means that there are less 'symbols' for the huffman code to deal with, so the data compresses. It's quantised by an amount which makes the compressed data the correct size.

Ok, that description was properly full of technical inaccuracies, seeing as I'm not completely clued up on all aspects, just the basics :)

At any rate, high frequencies are no more difficult to compress than low frequencies. It's the quantity of frequencies which are loud in relation to each other that determine how well it compresses. The absolute worst case, as somebody else here pointed out, is white noise, which contains all frequencies in equal measure - which is remarkably similar to a cymbal crash.

The effect you hear when mp3 compression doesn't work is 'noise modulation'. Imagine standing a distance off from a waterfall. After a while, your ear will filter out the sound and you won't be concentrating on it. If however the waterfall was turning itself on and off, you would be most alarmed. Quantisation gives this effect, but mp3 attempts to get around it by quantising masked frequencies more than those which aren't. But if there's white noise all it can do is quantise everything - so you get that weird 'swishing' noise (very technical term).


- John.

(The above may not represent the views of empeg :)