home    Nederlands

mp3 audio quality

Introduction

What is mp3?

Mp3 is the colloquial term for MPEG-1 Layer III, a popular audio compression algorithm. The codec is specified in section 3 of ISO 11172. This specification describes the mp3 bitstream. It does explicitly not desribe or prescribe an encoding process. The efficiency and quality of encoders used to convert an uncompressed audio-stream to a compressed bitstream may therefore differ significantly.
Mp3 is not lossless. That is, the result after decompression is not identical to the original. This in contrast to lossless compression algorithms like FLAC .
Here we will present some comparisons between original and reconstructed audio-files.

MPEG-1 audio Layers

Three codecs

ISO 11172 describes three audio compression standards: Layer I, II and III. Mp1, mp2 and mp3 for short. Mp1 is effectively not used. It is a simplified version of mp2, which is extensively used, for instance for DVDs and digital television.
Be aware that if you have the choice between buying a CD or a DVD of a musical performance the former will probably give you the best audio quality, at least 16 bit uncompressed full stereo sampled at 44.1 kHZ.

Read audible.pdf for a comparison between CD-quality and original tape recordings.

Frames

For mp2 and mp3 audio is encoded and decoded in chunks of 1152 samples, called frames. In mp2 there is a 1:1 relation between decoded and encoded frames. In mp3 however the encoder can 'steel' bits from the neighbouring frames to encode a difficult passage. Evidently this gives a better quality and/or a higher compression ratio. Lost is the possibily to edit the compressed bitstream frame by frame.
Because mp3 encoding is more flexible there will also be larger differences between individual mp3-encoders. The decoding process is deterministic, so the difference between decoders should be neglectable.

Fundamental differences between audio- and video-compression

In compression of a videofilm, of a football match say, we can use redundancy both in and between pictures.
If the green pitch is visible we can start with assigning averaged green colours to large parts of the picture and efficiently encode the structure of the grass to a certain detail. Also we can expect the same pitch in the next picture.
Audio however is fundamentally different. Although many audiostreams are repetitious on timescales from milliseconds to many seconds it is difficult to squeeze out the redundancies. Therefore MPEG audio encoding (compression) relies heavily on a psychoacoustic model. The idea being that some combinations and sequences of sounds are not perceived consciencely.

Encoding options

The mp3 encoding standard allows you to choose a number of encoding parameters. One obvious choice is between mono and stereo sound, where we can also in many case gain efficiency by encoding the average and difference of left and right channels separately.
More important is the choice of the bitrate, that is the compression ratio. The higher the compression ratio the larger the role of the psychoacoustic model.
Only certain bitrates are allowed according to ISO 11172:

• 32 kbit/s single_channel only
• 40 kbit/s mp3 only
• 48 kbit/s single_channel only
• 56 kbit/s single_channel only
• 64 kbit/s
• 80 kbit/s single_channel only
• 96 kbit/s
• 112 kbit/s
• 128 kbit/s
• 160 kbit/s
• 192 kbit/s
• 224 kbit/s not single_channel
• 256 kbit/s not single_channel
• 288 kbit/s mp1 only
• 320 kbit/s not single_channel
• 352 kbit/s mp1 only
• 384 kbit/s mp2 only, not single_channel
• 416 kbit/s mp1 only
• 448 kbit/s mp1 only

The mp3 encoder lame supports the stereo mode for low bitrates also.

Why this matters

The point I want to stress here is that in many cases the bitrate for mp3-encoding is chosen too low. On decoding the sound (music) may still seem to be there but the emotional impact of the music is compromised. Reasons for this choice are that it is not trivial to compare the original and the reconstructed soundtrack in an objective way and that everybody is happy if he or she can store more hours of their favourite music on a portable device with a limited storage capacity.

Examples

About the music

For the purpose of this research I used part of the aria Scherza infida (7m 49s) from Ariodante, Opera in tre atti (London 1735) by George Frideric Handel (1685-1759).

Text of the aria

ARIODANTE:
Scherza infida in grembo al drudo
lo tradito a morte in braccio
per tua colpa ora men vo.
Ma a spezzar l'indegno laccio,
ombra mesta, e spirto ignudo,
per tua pena io tornerò.
(bis)

English translation

ARIODANTE:
Sport, faithless one, in your lover's embrace.
Because of your betrayal I now go forth
into the arms of death.
But to break this vile bond
I will return to haunt you
as a gloomy shade, a mere wraith.

Performers

Dame Janet Baker (mezzo-sporano)
English Chamber Orchestra directed by Raymond Leppard

Software

I used the open source mp3 encoder LAME version 3.98.4 to encode and decode mp3-files and my own mergewav (see downloads) to manipulate and compare uncompressed wav-files. Around 2000 lines of Unix-shell scripting and several auxillary programs were written to facilitate this research.

Testprocedure to evaluate mp3 encoding

Lame was used to encode uncompressed audiofiles to mp3 for different bitrates and for different settings of qval , the quality related option of lame.
These files were decoded again to compare them with the original file.
The RMS of sample-values per channel was used to compensate for (small) differences in amplitude.
Additional comparisons were made to verify that the chosen amplitude compensation without shift in the audio-track lead to a difference file with minimal RMS amplitude.

Mono (tests with 1 channel)

To make the analysis as simple as possible and at the same time conserve as much as possible of the original soundtrack and its emotional impact, I converted it to mono (left+right over 2). Converted this to mp3 and back for different bitrates and different qval settings of lame.
Again to make the analysis as simple as possible I only used fixed bitrate encoding.
I found no shift in the audio and only a small change in amplitude of 5% or less for which I could compensate.
Because of their large size audio examples of these tests are not presented here.

I used a rather long fragment and as I was not satisfied with either the left- or right-channel, I created mono.wav as
mergewav -t ariodante/Scherza-infida.wav stereo.wav 0 114.02449
mergewav -M00 stereo.wav stereo.wav split-l.wav 1 0 0
mergewav -M11 stereo.wav stereo.wav split-r.wav 1 0 0
mergewav -M00 split-l.wav split-r.wav mono.wav 0.5 0.5 0

The length of these files is exactly 5028480 samples (114.02449 seconds). This is 4365 frames of 1152 samples.

Stereo (tests with 2 channels)

The mono case gave some insight in the encoding process.
It is however more realistic to test the encoding of a stereo audio-file, because the bitrate refers to the bandwidth of the compressed audiostream, not to the bitrate per channel.
Moreover in the real world everybody who is not stone-deaf and has two ears wants stereo.
For this second test I created a shorter stereo file fragment.wav with
mergewav -A ariodante/Scherza-infida.wav fragment.wav 23.5 73.994694 0.26122449 2

The length of this file is exactly 2226816 samples (50.495 S). This is 1933 frames of 1152 samples.
I performed essentially the same tests, but now in stereo mode (joint stereo mode for lame encoding).

Comparing the left- and right-channel

As you can hear in the example below, the left and right channels are highly correlated (correlation coefficient 0.62), which also shows up in the amplitude diagram of the longer fragment.
mp3-quality_files/mono-left-right.jpg

Diagrams for the mono case

Results for qval=0
mp3-quality_files/mono-lame-mp3-quality.jpg
Selected results for qval=0, 2 and 7
mp3-quality_files/mono-lame-qval0-qval7.jpg
Average difference in dB
mp3-quality_files/mono-lame-qval-compared.jpg

Diagrams for the stereo case

mp3-quality_files/lame-qval-compared.jpg

Table of Correlation coefficent between lame mp3 stereo encoding and original fragment.wav

file left channel right channel
lame-2-128.wav 0.995996575 0.994915004
lame-0-128.wav 0.997202041 0.996359810
lame-2-160.wav 0.997611649 0.996931598
lame-0-160.wav 0.998505363 0.998059627
lame-2-192.wav 0.999899143 0.999898968
lame-0-192.wav 0.999906720 0.999905836
lame-7-128.wav 0.999919437 0.999906273
lame-2-224.wav 0.999941206 0.999943275
lame-0-224.wav 0.999946966 0.999947633
lame-7-160.wav 0.999959154 0.999954031
lame-2-256.wav 0.999967101 0.999968241
lame-0-256.wav 0.999970315 0.999970959
lame-7-192.wav 0.999975788 0.999975815
lame-7-224.wav 0.999985226 0.999986122
lame-2-320.wav 0.999988290 0.999988375
lame-0-320.wav 0.999989789 0.999989995
lame-7-256.wav 0.999990744 0.999991393
lame-7-320.wav 0.999995961 0.999996234

The same data in graphical form:
mp3-quality_files/qclog.jpg

Encoding speed

qval=7 gives the fastest encoding and the best result.
mp3-quality_files/encoding-time.jpg

Low bitrate cases

The sampling_frequency of the original audiofile is 44.1 kHz, in accordance with the Red Book .
Encoding and decoding by lame preserves this sampling_frequency, except for low bitrates.

bitrate resulting sampling_frequency
for mono
resulting sampling_frequency
for stereo
32 22050 16000
40 24000 16000
48 32000 22050
64 44100 24000
96 44100 32000
128 44100 44100
160 44100 44100
192 44100 44100
224 44100 44100
256 44100 44100
320 44100 44100

This does not mean that the sound quality is abominable, but it makes it hard to compare these results with the original. Therefore I ignored them.
Note however that the range of the audio frequencies that can be reproduced is limited to ½ of the sampling_frequency. So a sampling_frequency under 24 kHz makes ipse facto significant concessions to the audio quality.
to top of page

Listen

We only present the stereo cases.
Original WAV-file filesize
fragment.wav 8907310

kpbs mp3 lame qval= 0 encoded mp3 lame qval= 2 encoded mp3 lame qval= 7 encoded filesize
32 lame-0-032.mp3 lame-2-032.mp3 lame-7-032.mp3 202320
40 lame-0-040.mp3 lame-2-040.mp3 lame-7-040.mp3 253080
48 lame-0-048.mp3 lame-2-048.mp3 lame-7-048.mp3 303281
64 lame-0-064.mp3 lame-2-064.mp3 lame-7-064.mp3 404544
96 lame-0-096.mp3 lame-2-096.mp3 lame-7-096.mp3 606960
128 lame-0-128.mp3 lame-2-128.mp3 lame-7-128.mp3 808750
160 lame-0-160.mp3 lame-2-160.mp3 lame-7-160.mp3 1010938
192 lame-0-192.mp3 lame-2-192.mp3 lame-7-192.mp3 1213125
224 lame-0-224.mp3 lame-2-224.mp3 lame-7-224.mp3 1415314
256 lame-0-256.mp3 lame-2-256.mp3 lame-7-256.mp3 1617501
320 lame-0-320.mp3 lame-2-320.mp3 lame-7-320.mp3 2021876

We present the following optimized differences between encoded+decoded and original versions of fragment.wav as mp3-files, encoded with qval=7, 256 kbps.

lame qval=0 vs. original lame qval=2 vs. original lame qval=7 vs. original
kpbs 1 × 10 × 100 × 1 × 10 × 100 × 10 × 100 ×
128 -45.51 dBA -25.51 dBA -44.01 dBA -41.15 dBA
160 -48.24 dBA -28.24 dBA -46.23 dBA -26.23 dBA -44.17 dBA
192 -40.81 dBA -40.49 dBA -46.69 dBA -26.69 dBA
224 -43.31 dBA -42.90 dBA -48.96 dBA -28.96 dBA
256 -45.84 dBA -25.84 dBA -45.43 dBA -25.43 dBA -31.01 dBA
320 -30.48 dBA -29.86 dBA -34.60 dBA

Evaluation


• The lame qval encoding parameter set to 7 gives the best result.
It is also the fastest encoding. However the help-info of lame states otherwise:
Quality related options:
-q n Internal algorithm quality setting 0..9.
     0 = slowest algorithms, but potentially highest quality
     9 = faster algorithms, very poor quality
-h same as -q2
-f same as -q7
=======================================================================
fast mode -f
Same as -q 7.
NOT RECOMMENDED. Use when encoding speed is critical and encoding
quality does not matter. Disable noise shaping. Psycho acoustics are
used only for bit allocation and pre-echo detection.
=======================================================================

Secondly it is evident that you loose quality by mp3 encoding, although it is neglectable for 256 or 320 kbit/s.
Thirdly it seems that the music encoded with lower bitrates (say 128 kbit/s) looses some of its emotional impact, but this is off course subjective. to top of page

Lossless audio compression

There are several codecs for lossless audio compression. Well known is FLAC (Free Lossless Audio Codec). It is an open source algorithm, available on most Linux distributions and from http://flac.sourceforge.net/ .
FLAC is indeed lossless like gzip, but more appropriate for audio-files than gzip or zip, however with a few caveats.

• The compression factor of FLAC is - as expected - smaller than for mp3 , 500kbit/S seems to be a normal result.
• Very short files (a click, say) are in fact not compressed but expanded by FLAC.
• Silence, if anyone is interested in that, is better compressed by gzip than by FLAC.
• An empty audiofile (0 samples, quite legal), can not be handled by FLAC.
• Random noise is hardy compressed at all by FLAC, but that is what you would expect.
• At least for version 1.2.1, the options -0..-8, --compression-level-0..--compression-level-8 give all the same result.

Some numerical results for FLAC

content samples channels duration S filesize wav filesize flac filesize gz flac/wav
Aria 'Scherza infida' 331776 stereo 7.523265 S 1327150 471082 1115615 1:2.817
random noise 368640 mono 16.718367 S 737326 636378 668474 1:1.158
silence 368640 mono 16.718367 S 737326 9312 826 1:78
short file (random noise) 16 mono 0.00036 S 78 8350 117 107:1
empty 0 (mono) 0 S 46 error 84 --

Suggestions for further research

The tests I did should be done again with
• a different (better?) mp3 encoder
• variable and fixed bitrate
• other music
More important is that I can only offer a subjective evaluation.
fMRI offers a more objective way to evaluate the emotional impact of music reproduced in different qualities although the results should not be overinterpreted . •
An other approach is to use panel of listeners and a procedure like ITU-R Recommendation BS.1534 .

home (English)    Nederlands