Audio Volume Balance

Posted by: calseeor

Audio Volume Balance - 05/01/2000 11:20

Hello All,
I was wondering if any of you have or know of a program that can equalize the volume of all your mp3 songs? I have just started to record MP3's for my Empeg, once I get it. As it stands, I have 1300 songs so far, all recorded on the same pc but the volumes of older cds vs: newer cds and those vs: homemade cds from private recordings, varies. I know this is how the individual cds are, but I was hoping for some sort of mp3 volume adjusting software that will batch process all you mp3's to a relative volume level. Any ideas?
I have heard of programs that will batch transform your cds from a high kps to a lower one, but it does not affect the volume.Thanks...
Eryk


Posted by: tfabris

Re: Audio Volume Balance - 05/01/2000 13:25

This was discussed recently on the "Wish List" thread.

Equalizing the volume of your songs is a process called "Normalization". This can only be done at the beginning of the process, when the file is still in .WAV format. This option is available in most ripping software so that it happens automatically for you (provided you enable the option).

Because of they way MP3s are encoded, the only way to change the volume of an already-compressed song would be a lossy process: De-compress the MP3 to a .WAV, normalize the .WAV, then re-encode the .WAV a second time. If you re-compress something that's already been compressed once, you might induce audible artifacts. So it's not ideal. In fact, I don't know of any program that does this for you.

Usually, the only way to solve the problem properly is to re-do your collection from the original CDs, while enabling the normalization option in your ripping software. I'm in the same boat as you are: I have a lot of albums that I encoded without normalization, before I knew it was important. I'm not prepared to re-rip everything now.

My suggested solution (on the wish list board here) is to have the MP3 player software auto-adjust the volume of the next song before it begins playing it. It could do this by peeking ahead at the next song and calculating what relative volume it should be played at. This would be a very serious undertaking and it's not trivial to implement at all. The Empeg folks are pretty busy implementing more important features and fixing bugs, so it's highly unlikely we'll see this any time soon (if at all).

There is a new tag specification for MP3s (called ID3v2) that allows a "relative volume" field to be added to each file. At the moment, the Empeg can't read these kind of tags, so that's not available to us Empeg users, either.

One thing that might help is this: Very soon, the Empeg will have the ability to link a specific song to a specific EQ preset. This is on their "to do" list. You could have some EQ presets that are louder than others (by boosting more frequencies than you cut). For me, this would work perfectly, because my older albums also could use some extra EQ trimming to sound good compared to new albums. These are the same songs that need to be normalized anyway.


Posted by: calseeor

Re: Audio Volume Balance - 06/01/2000 10:54

Ahhh..ok. Thanks. I found the normalization check box and, as you had said, it was not on. I turned it on and will have to , some day, start the process of re-recording my MP3's. At least all the new ones I record will be set.

Thanks for this info.


Posted by: john

Re: Audio Volume Balance - 06/01/2000 11:31

Technically, you can in fact change the normalisation while it's in .MP3, but that requires decoding it, scaling all the coefficients, then recoding. This doesn't involve conversion to .WAV at any point, so it's lossless. And no, I have never seen anything that does this :)

I wonder if patents allow for such a program, even. Anyone a qualified patent lawyer? :)

- John.

(The above may not represent the views of empeg :)
Posted by: tfabris

Re: Audio Volume Balance - 06/01/2000 12:04

If this is true, I would like to enter a detailed discussion of this topic with someone knowledgeable about it. I might be interested in writing a program that does this.

I was under the impression that the MP3 frames were encoded in such a way that prevented this kind of manipulation.

What can you tell me about this? Can it be done simply by locating the proper bytes and altering them? Or would a lot of math be involved? What do you know?

Posted by: john

Re: Audio Volume Balance - 07/01/2000 06:59

> What can you tell me about this? Can it be done simply by locating the proper bytes and altering them? Or would a lot of math be involved? What do you know?

There'd be quite a lot of work involved here, and a bit of math(s) too. First, you need to know the mp3 format. You then decompress (huffman coded I think) the DCT terms for each frame to get the frequency components. Multiply all the components by a scale factor (must be the same for every frame or you'll get glitches between them). Then compress the terms again with huffman and write out an mp3 stream.

This sounds complex (it is) but it'll take far less time than decompress/recompress via PCM data - definitely something that can be done real time. Not a simple job though.

- John.

(The above may not represent the views of empeg :)
Posted by: tfabris

Re: Audio Volume Balance - 07/01/2000 10:39


But isn't there more to it than just multiplying by a scale factor? Doesn't changing the volume of the frame data change the characteristics of the compression algorithm?

I know the file format, but I don't understand the arcane details of the compression/decompression algorithms. Once I'm past the frame header, I'm lost. That's why I'm asking.

Posted by: Verement

Re: Audio Volume Balance - 09/01/2000 11:04

The "compression algorithm" at the bit level in MPEG audio layer III is really just the Huffman coding and the use of a bit reservoir. Any decoding and recoding here possibly runs the risk of changing the length of the main_data for the frame, and this could cause problems if the bit reservoir is run short.

What might be interesting to try instead is to manipulate the global_gain value for each of the two granules in each frame. Since this is a constant-width field in the frame's side information, modifying it would be trivial. However, you would still need to decode the entire stream to determine an appropriate scaling factor. Also, I have no idea whether the effects of changing this value are really acceptable in practice -- it's only an 8-bit value, and it's not clear to me how much the other coded information is predicated on it.

Another option it seems to me is rather than modify the audio stream itself, modify the decoder to perform the scaling in real time while the audio is playing. This assumes the player can know in advance by what factor to scale, but people have talked about some ways to accomplish this.

This does sound like an interesting project for someone to pursue. As far as patents are concerned, to my knowledge most of them concern the psychoacoustic model and the encoding process; what I think is being proposed here is just some bit twiddling on the resultant bitstream.

-v

Posted by: dionysus

Re: Audio Volume Balance - 09/01/2000 12:47

...for the sake of the empeg, wouldn't make more since to simply do it on the player side? Maybe even as a plugin that pre-analyzes the next song, and lowers or raises the volume accordingly...

This would mean that those of us with 100 cd's on our setups don't have to spend time looking for our originals...

...proud to have one of the first Mark I units
Posted by: Verement

Re: Audio Volume Balance - 09/01/2000 13:47

Unfortunately modifying the global_gain alone as I suggested doesn't seem viable; some tests I did suggest it will result in utterly unrecognizable audio.

Any in-place modification of the MP3 stream would probably need to recode the entire frame (without overrunning the bit reservoir.)

-v

Posted by: tfabris

Re: Audio Volume Balance - 09/01/2000 13:59


What might be interesting to try instead is to manipulate the global_gain value for each of the two granules in each frame. Since this is a constant-width field in the frame's side information, modifying it would be trivial.

Now we're talking!

You seem to really know your stuff. Can you help give me some specifications as to how I might be able to locate these bytes in a standard 128kbps 44khz MP3 file? Are they part of the frame header? I've already got some code that dissects the frame header a little bit and performs some simple operations based on it. I'd love to experiment with it and see what I can come up with.


However, you would still need to decode the entire stream to determine an appropriate scaling factor.

Yes, of course. If you wanted to go to that kind of trouble. Although a simpler program could just allow you to manually enter the scaling factor, then preview your changes by ear.

If it worked at all, anyway...


Another option it seems to me is rather than modify the audio stream itself, modify the decoder to perform the scaling in real time while the audio is playing. This assumes the player can know in advance by what factor to scale, but people have talked about some ways to accomplish this.

Exactly. As I've said before, a playback-only normalizer sounds technically feasible (but not easy).



Posted by: tfabris

Re: Audio Volume Balance - 09/01/2000 14:01


Oh, nevermind, then. :-)

Damn, had me excited for a minute there.



Posted by: tfabris

Re: Audio Volume Balance - 09/01/2000 14:12


...for the sake of the empeg, wouldn't make more since to simply do it on the player side? Maybe even as a plugin that pre-analyzes the next song, and lowers or raises the volume accordingly...

Yeah, it's in the "Wish List" thread. I put it there. But like I said, it's not trivial to implement something like that. It would take a lot of time for a programmer to implement it properly. The Empeg programmers have a lot on their plate as it is right now.

Even if it could be implemented, It would require a significant amount of CPU time, and it would also double the amount of disk access the player performs.

Wait a minute... would it?

If the player is already precaching the next song anyway (I know it does), then maybe it wouldn't increase disk usage. Well, at least for short songs. For symphonies and one-track rock operas, it would require quite a bit of extra disk access. But for pop songs, you could probably decode and analyze most (all?) of the next track without hitting the disk again. Hmm...


Posted by: dionysus

Re: Audio Volume Balance - 09/01/2000 14:39

I would thing that it could be made as a program that run ON THE EMPEG (so that you don't have to actually re-upload your songs) that maybe adds a comment in every mp3 file on the empeg w/ the normalization value.. The player software could then just read this comment to get the right value... and future songs normalization values could be added using emplode. That way you could just run the program for an afternoon or so, as opposed to having to reupload every song over again.

...proud to have one of the first Mark I units
Posted by: Verement

Re: Audio Volume Balance - 09/01/2000 14:56

In case you're still interested in playing around with it (I may have goofed in my haste to test), I'm attaching a code excerpt which decodes the layer III side information from the bitstream, immediately following the header (and CRC, if there is one.) This isn't the entire frame, as the main_data portion can be located before the header and continues after the side information. The main_data_begin is a negative-offset pointer from the beginning of the frame to the beginning of the main_data. Code to decode the main_data (which includes the Huffman codes) is not included.

The bit_read() routine is supposed to read the number of specified bits from the stream and return their value. The code should be enough to show you where the global_gain field is, among others. To understand all the fields though you really need to have a copy of the MPEG audio spec.

I don't know why the BBS isn't registering my attachment, so get the code from here.

Cheers,
-v

Posted by: tfabris

Re: Audio Volume Balance - 09/01/2000 15:28

Thanks! I've downloaded the code and I'll play with it when I get a chance.

Posted by: tfabris

Re: Audio Volume Balance - 09/01/2000 15:58


Interesting idea.

What you're saying is... write a Linux program to run on the Empeg... It runs through your Empeg's files, analyzes their volume levels, and adds the appropriate relative-volume command to the music database for each song.

Simple. Elegant. Relatively easy to implement (when compared to a dynamic system). Doesn't affect caching or CPU cycles during playback. The resulting data could simply appear as another field on the song's property sheet in Emplode, allowing you to manually tweak the numbers if you choose.

It could even happen automatically at upload time, as part of the synchronization process. After the rest of the synch is complete, any songs on the Empeg's hard drive that don't yet have the field tagged would get scanned. The first time a user runs this new version of the software, it would run on all their songs, and it would take a long time, but subsequent runs would only analyze new uploads, so it would be quick.

Even then, you could optimize the process by only analyzing some of the frames. Say, every other frame or every third frame. I'd bet you'd get the same results. A little experimentation could give you the optimal balance between how accurate the analysis is compared to how much time you spend decoding frames.

The initial version could simply run the analysis as a batch job after the songs have been uploaded. Future versions could speed up the process by performing the analysis as the song data is being uploaded, resulting in practically no extra synch time wasted at all.

Of course, it would have to be a selectable option in Emplode-- Users who have already pre-normalized their songs should be able to turn off the option.

But most importantly, this is an elegant solution because it would be transparent to the end-user. They wouldn't need to worry about anything more than whether or not to select this menu option. Just select the menu option, and the Empeg magically normalizes all the songs on the next synch.

I like it. I really like it.

"Mac", are you reading this? What do you think?



Posted by: dionysus

Re: Audio Volume Balance - 09/01/2000 17:08

exactly... Songs that haven't already been normalized would have a value equivilant to 0, and songs that have been normalized would get bumped up or down depending... And the best part is, the original file would be left alone, and the user could change the setting for that perticular song (through emplode?) if he or she chooses...

...proud to have one of the first Mark I units
Posted by: tfabris

Re: Audio Volume Balance - 09/01/2000 20:07


Well, the actual coding would require that the "not yet normalized" flag be something other than a 0 in the field, since it's possible that a given song might not need any change in volume. I don't know how the values are stored in the music database, so I don't know how it would be flagged. But yeah, that would be the basic idea.

I'd really love to hear what Mike Crowe ("Mac" here on the BBS) has to say about this.

Oh, and sorry for carrying on such a technical discussion on the "General" message board. :-)


Posted by: mac

Re: Audio Volume Balance - 10/01/2000 04:37

I'd really love to hear what Mike Crowe ("Mac" here on the BBS) has to say about this.

*Looks around*
Who? me?

Oh, ummmm.

After minimal thought it does look feasible to normalise songs during download or to add a normalise option to emplode which works rather like the "play on empeg" toolbar button.

An alternative (suggested by someone else some time ago) is to analyse the song as it is played the first time through and then store a normalisation value ready for next time.

I think that automatically normalising the next song before it is played is rather too complex. Dealing with track skipping and the fact that the whole of the next track won't actually be cached would require the disk to be spun up much more than is desirable.

--
Mike Crowe
I may not be speaking on behalf of empeg above :-)
Posted by: Dearing

Re: Audio Volume Balance - 10/01/2000 07:50



An alternative (suggested by someone else some time ago) is to analyse the song as it is played the first time through and then store a normalisation value ready for next time.


I was going to recommend the same thing. Not only for normalisation, but also for the "Time Remaining" counter. The play time could just be stored in another field of the database.
I wouldn't mind having to play the song first before I have this info the first time.


_~= Dearing =~_
"WAY too happy about having #99."
Posted by: rjlov

Re: Audio Volume Balance - 10/01/2000 17:40

It would certainly be nice to store information calculated
during the playing of a song for further reference.

I can think of at least four other other applications of this
that would be nice (if the player had hooks so we could add
modules in).

You could even have "persistent" volume adjustments made by the user during
the song, so that next time it is played, the volume adjustments are
automatically made at the same times.

However, is it feasible? I thought all partitions were
mounted read only.

How difficult would this sort of thing be?

Also, I seem to recall seeing somewhere that we would be able to
retrieve statistics from the player as to how many times a particular
track had been skipped, and that sort of thing. Is that implemented
yet?

Richard

Posted by: tfabris

Re: Audio Volume Balance - 10/01/2000 17:56

Who? me?

Yes, you.

After minimal thought it does look feasible to normalise songs during download or to add a normalise option to emplode which works rather like the "play on empeg" toolbar button.

Cool! Sounds great! So when can we see it implemented? Day after tomorrow?

Just kidding. I don't know about everyone else, but there's other stuff I'd rather see sooner, such as linking different EQ presets to specific songs, and having the EQ's automatically switch personalities depending on whether it's plugged in at home or in the car. I'm sure you've got your own lists of priorities anyway...






Posted by: tanstaafl.

Re: Audio Volume Balance - 11/01/2000 19:24

I am a little bit confused about this "Normalization" business. What exactly is normalization supposed to accomplish? It sounds like you are attempting to have all the music play at the same volume, but this would not be right. I don't expect a Haydn string quartet to make as much noise in my car as, say, Metallica. Some music is supposed to sound louder than other music.

Would you please expand on this normalization concept a bit for me?

Thanks...

tanstaafl.



"There Ain't No Such Thing As A Free Lunch"
Posted by: tfabris

Re: Audio Volume Balance - 11/01/2000 20:05


I don't expect a Haydn string quartet to make as much noise in my car as, say, Metallica. Some music is supposed to sound louder than other music. Would you please expand on this normalization concept a bit for me?

You're right, some music is supposed to sound louder than other music.

However, in the case of CDs, songs that are supposed to sound at the same volume might not be that way, depending on how the CD was mastered.

For instance, old CDs that were made in the heyday of vinyl don't push the dynamic range of the format very much. They were mastered for the dynamic range of LPs, and they don't get as loud as the CD will go. But new CDs do, because they've been mastered from the ground up with the CD format in mind. Compare any CD from the 80's to any CD created within the last few years. All the recent CDs are much louder, by more than just a few decibels.

This even applies when you're comparing the exact same song. For instance, you might have an old CD in your collection that's got the exact same song as a new remastered version of the album. The old CD will be MUCH quieter. Often, the songs get normalized when they create an artist's "greatest hits" album so that the songs fit together as a collection better.

Normalization isn't an issue for regular CD players because you'll tend to listen to one album at a time, and the whole album will be at its proper relative levels. But when you have something like the Empeg, you could be listening to a fifteen-year old track right before a recent track. What happens is that you turn up the volume for the old song, then when the new song starts, it's THIS F***ING LOUD and you have a heart attack scrambling for the volume control and hope that you didn't just blow your 200-dollar speakers.

Now, obviously this applies to pop music more than classical music. However, the same issues exist: older albums won't ever hit the peak levels on the CD, while new ones will. Remember that normalization doesn't just artificially make a song louder... the idea is that you measure the peaks in song, then you normalize the volume so that one song's peaks are pretty close to the same as the last one's. Even quiet songs can have loud peaks.

Radio stations have known this for years, and they use compressor/limiters to normalize their music. They also compress the audio to fit the dynamic range of FM radio, and to make the songs blend better with the commercials.

The Empeg, by its very nature, clearly shows how much difference there is in the mastering techniques from album to album. The equalization, compression, and overall volume of a CD will vary widely from album to album. That's why we need features like per-song EQ and normalization.



Posted by: tadzio

Re: Audio Volume Balance - 12/01/2000 02:27

It sounds like you are attempting to have all the music play at the same volume, but this would not be right. I don't expect a Haydn string quartet to make as much noise in my car as, say, Metallica. Some music is supposed to sound louder than other music.

You are right. That's why I like the idea of having some sort of "global gain" attribute stored with each song in the empeg's database, with no changes to the actual music data. There could be a function in Emplode to analyze the average volume of a song and then make a proposal for that attribute, but the user would be able to manually change it, or not have it set at all.

In my eyes this combines the advantages of many of the proposals here: rather easy to implement on the empeg side, no distortion due to re-evaluating the music data, total user control over the result.
As an extra bonus, this attribute could be taken from the corresponding ID3v2 Tag value if it exists...

Regards
Daniel



Posted by: tfabris

Re: Audio Volume Balance - 12/01/2000 10:36


There could be a function in Emplode to analyze the average volume of a song and then make a proposal for that attribute, but the user would be able to manually change it, or not have it set at all.

Yup, that would be the perfect solution, for all the reasons you mentioned.

One important issue with it, though... (Oh, Mike, are you still reading this thread?)

You suggest that Emplode analyze the song. I disagree. I believe the analysis should run in Linux on the Empeg itself (and be triggered by Emplode during the synch process). Reasons:

1) There's no MP3-decoding code in Emplode (as far as I know), they'd have to write a decoder into Emplode before it could do it. On the Empeg itself, the decoder is already there.

2) When you upgrade to the software version that contains this feature, it would be able to scan all the songs that you've already uploaded to the Empeg. You wouldn't have to re-scan the files as they exist on your computer. For some people, the MP3s in their emplode might not have a 1:1 correspondence with the ones stored on their computer. I know that I personally have altered the directory structure on my computer since uploading the bulk of my songs to the Empeg.

3) The normalization values could get written directly to the Empeg's music database immediately as the scan was progressing. More efficient than uploading these fields after-the-fact.

4) If it was done as part of (or at the end of) the synchronization process, then it becomes totally transparent to the end-user.

Mike? Comments?

Posted by: Verement

Re: Audio Volume Balance - 26/02/2000 00:13

I take this back.

It turns out the Layer III global_gain field can be modified after all, with the net effect of increasing or decreasing the overall gain of the output just as you might expect. To be consistently uniform, it needs to be offset by the same amount in every channel in every granule in every frame (two channels per granule in stereo streams, and two granules per frame.) The code I posted earlier should provide enough information to locate the field(s) within a frame.

Sorry I goofed when I tested this before. I have no idea what I was doing.

I might even have enough code to compute an appropriate offset value in order to normalize the output... if anyone still has any interest in this, let me know.

-v


Posted by: Henno

Re: Audio Volume Balance - 26/02/2000 04:37

if anyone still has any interest in this, let me know
Verement, you've just made the understatement of the year! (so far)

Remember that this thread started out with a post from calseeor that said(quote):
As it stands, I have 1300 songs so far, all recorded on the same pc but the volumes of older cds vs: newer cds and those vs: homemade cds from private recordings, varies.
( . . ) I was hoping for some sort of mp3 volume adjusting software that will batch process all you mp3's to a relative volume level.

(unquote)

I bet we all have a dozens of playlists that we'd like to have normalised among the hundreds that are recorded right.!
Yes Yes Yes, we're interested. Especially if you can make it to run in the empeg box.

Sorry I goofed when I tested this before. I have no idea what I was doing.
We'll forgive, once you deliver

Henno
# 00120
Posted by: tfabris

Re: Audio Volume Balance - 26/02/2000 10:29


I haven't had a chance to look at the code you posted before. I downloaded it to my hard disk, and it's on my "to do" list...

Of course we'd be interested in it. And I'll tell you what: If you could come up with a simple Windows front-end to select (or group select) MP3 file(s) and batch-process their gain fields, you could shareware-sell that puppy over the internet. There would be a big demand for such a thing.

One interesting issue:

I've been doing a little experimenting, and I've discovered that peak-detection normalization is not going to work "automatically", since many of the albums that I consider "quiet" are actually already peak-normalized. It's just that newer albums are compressed more, with the quiet bits being closer to the peaks. If you increase the global gain on a track that is already normalized, you will clip the peaks just a little bit. For the albums that I want to increase, though, I don't think the effect would be noticeable. If you could come up with a simple test program, I have a few tracks in mind that I could try it on (ones that sound VERY quiet overall except for a couple of very large peaks) and see if it's a viable to do that.

The other option is to leave the old albums unchanged, and only reduce the global gain on the newer albums. That would have the same net effect, but without the peak clipping.



Tony Fabris
Empeg #144
Posted by: dionysus

Re: Audio Volume Balance - 26/02/2000 10:33

do it, do it, do it, yeah yeah yeah!

...proud to have one of the first Mark I units
Posted by: Verement

Re: Audio Volume Balance - 26/02/2000 11:23

I don't think I could be convinced to write a Windows interface, but if you wanted to do that I could supply the back-end. The only catch is all my code will be licensed under GPL.

The idea I have to come up with a normalization offset is to look at the requantized frequency values coming out of the Huffman decoder to determine the absolute peak -- and then compute the gain offset needed to normalize it, applying this gain to the entire bitstream. As it turns out, the way the requantization formula works makes this almost trivial to compute. Each global_gain offset multiple of 4 will affect the frequency value amplitudes by a factor of 2.

I think you already said this, but the only time I could see this not working well is when the bitstream is already "normalized" but the peaks do not correspond with maximum frequency amplitudes the way it is indended to be heard. For example, you may not want to peak-normalize something intended to be quiet with very low peaks throughout. In this case I think some user discretion is needed.

I'll try writing some code to do this anyhow and you can see how well it works.

-v

Posted by: dionysus

Re: Audio Volume Balance - 26/02/2000 14:02

...If the processing is to be done on the linux side, it might not be a bad idea to build an undo feature into your program; a little script that undoes the changes in case something goes wrong..
-mark

...proud to have one of the first Mark I units
Posted by: tadzio

Re: Audio Volume Balance - 26/02/2000 19:16

The idea I have to come up with a normalization offset is to look at the requantized frequency values coming out of the Huffman decoder to determine the absolute peak.

Normalization is great, but I don't think the scaling factor should be based on the peak level, as this is a very "local" criterion. RMS (the average energy in the signal, measured over the whole song) sounds much more useful to me. And it shouldn't be much harder to implement than just peak detection.

And if you provide the back-end, I would be willing to do a simple Windows front-end, also GPLed.

Daniel


Posted by: Verement

Re: Audio Volume Balance - 26/02/2000 19:52

Normalization is great, but I don't think the scaling factor should be based on the peak level, as this is a very "local" criterion. RMS (the average energy in the signal, measured over the whole song) sounds much more useful to me. And it shouldn't be much harder to implement than just peak detection.

Could you elaborate on this a bit more?

The problem I see is that you don't want to increase the gain so much that the peaks are outside the (-1.0, +1.0) normalized range, otherwise you'll get clipping. How would you propose to calculate an appropriate gain?

And if you provide the back-end, I would be willing to do a simple Windows front-end, also GPLed.

Sounds like we'll get a Windows version one way or another. I have this idea that it could also be run batch-style on the empeg itself...

-v

Posted by: tfabris

Re: Audio Volume Balance - 27/02/2000 01:00

Could you elaborate on this a bit more?

What Tadzio was saying is: If you just measure the peaks, you're not going to get the desired result.

Technically, the definition of normalizing the file is measuring the highest peak in the song and then bringing up the level of the whole song so that the one peak hits 100%.

But that won't get the desired result. The desired result is to have all the songs at a similar apparent volume. But the only way to do this properly is either with a) compression, or b) getting a more average volume from the track rather than just measuring the peaks.

You're right in that you don't want to adjust the gain so much that it clips the peaks. So no matter what algorithm you choose to implement, you need to measure the peaks so that you can notify the user if they choose to exceed 100% so that they can decide if a few clipped peaks are OK.

Let me elaborate further and give a concrete example. This comes from direct recent experience:

I have an 80's-era album by Yes titled "Big Generator". It's a really good album. But I don't listen to it as much as I should because it's mastered poorly. Its apparent volume level is very quiet. Even on the really loud rocking songs (such as "Almost Like Love", which includes some full-blast Tower Of Power-style horn section work), it seems dull, quiet, and lifeless.

On the other hand, a copy of Madonna's recent "Ray of Light" is mastered in such a way that it seems many many times louder. I'm not talking just a few notches on the volume knob. I mean that the quietest song on Ray of Light seems at least 15db louder than the loudest song on Big Generator. On my Empeg, I have to crank Big Generator all the way up to 0db to make it sound good. But when I listen to Ray of Light, that level would be damaging to my speakers. So for Ray of Light, I have to crank it back to at least -10 or -15 before things are OK again.

But if I take the raw .WAV data from those albums and look at them in a wave editor, I find that they both are already normalized. The peaks on Big Generator hit 100%, and the peaks on Ray of Light also hit 100%.

So what's the difference? Compression.

Compression, normalization, and equalization are among the last things done to an album before it gets pressed onto a CD. The process is known as "Disc Mastering" and there are companies that specialize in doing just that. They know how to take the artist's master tapes and tweak them so that they sound good in the real world.

Ray of Light is mastered like a TV commercial: Highly compressed, so that everything seems loud. Big Generator is mastered more like a classical album, without any attempt to compress the audio before the final master.

Surely it's better to have music without the compression right? No, not necessarily. Even music with lots of dynamics can benefit from compression. The trick is to do it well. The production on Ray of Light is very crisp, and the album, sonically, is amazing to listen to. Big Generator's mix sounds muddy and dull in comparison.

I believe that if you were to apply a standard peak-detection algorithm to either of these albums, there would be little or no change made to the apparent volume of either one.

But if you tried to glean some kind of an average volume from them (not an easy task!), you would probably severly clip the Big Generator album's peaks in an attempt to make it sound as loud as Ray of Light.

I don't see an easy algorithmic solution to this problem. The only thing I can think of is that you can let the user decide themselves how much to adjust the global gain. For instance, in my case, I might try adding just a couple of DB to Big Generator and then listen to see if the clipped peaks are noticeable. But overall, what I'd rather do is reduce the gain of the loud albums instead of adding gain to the quiet albums. Of course, there's probably an issue with a sonic "floor" there, too, and it might be just as bad of a problem. We'll have to experiment and see, I guess.

What a fascinating problem. This really cuts to the root of the issues of digital audio production. Why are recent CDs mastered so differently than older CDs? What is it about the mastering process that makes a disc sound good or bad? Proper compression is truly an art form.



Tony Fabris
Empeg #144
Posted by: tadzio

Re: Audio Volume Balance - 27/02/2000 05:16

Could you elaborate on this a bit more?

The problem I see is that you don't want to increase the gain so much that the peaks are outside the (-1.0, +1.0) normalized range, otherwise you'll get clipping. How would you propose to calculate an appropriate gain?


Well, to avoid clipping totally, we would have to actually compress the song, as Tony described. This would, however, be much more complicated than just increasing a 'gain' field, so it isn't feasable.

But clipping a few samples doesn't affect sound quality nearly as much as many think. Just to make sure I know what I say, I loaded an already normalized song (Uriah Heep - Lady In Black) into CoolEdit and amplified it by 10dB (factor 3). Clipping, of course, occured all over the place - nearly every beat was clipped. But listening to it, it still sounded good. Of course, I could hear a difference, but this is rather an extreme example.

Thinking about it, I now see that we actually have two questions here:

1. What's the right 'metric' to measure the 'loudness' of a song?

The whole idea of normalization is that you don't have to fumble with the volume control all the time. That means that all the songs should have the same subjective 'loudness'. When we want to write a program that lets the user decide how much he wants to amplify a certain song, then we need to tell him how loud it actually is. If this has to be just one value, then it really should be averaged over the whole song, not just a few samples. The average energy in the sound signal (expressed in dB) is not a bad metric for how loud a human ear senses music. CoolEdit uses RMS, expressed in dBFS (Sine or Square Wave) in its Statistics panel.

2. How can we automatically find the optimum gain?

I agree that this is not trivial. But just using the peak value could even achieve an adverse effect to what we intend. Imagine two songs, both with the same subjective loudness. The first is highly compressed (little dynamic), whereas the second uses all available dynamic. All the samples in the first will be very close to its average value, none of them reaching 100%. Peak normalizing that song would make it subjectively much louder. On the other hand, the uncompressed song will most likely already contain some 100% samples, so that normalizing would not change its loudness at all. Net effect: normalizing actually introduces a difference in loudness instead of eliminating it. (In a nutshell: peak normalization will generally make compressed songs appear louder than uncompressed.)

So, peak detection is not really a good criterion. We want the subjective loudness to be equal, so calculations could be based on what I described under 1. However, I agree that this alone is not good enough, as it could lead to some clipping distortion. What I suggest is that we let the user enter a desired loudness, then check to see how much clipping would occur. If this is over a certain limit, then we can either warn the user, or decrease the target loudness until we fall under that limit.

The definition of this limit needs some more experimentation and discussion, I think. Some ideas:
- not more than x samples - which is ~x/44000 seconds - in a row are clipped
- samples are clipped by no more than y%, i.e. no sample is over (100+y)% fullscale
- clipped energy (max(sample-fullscale, 0), summed up over the whole waveform) is below z.

Oh, and btw: CoolEdit 2000 can already do peak normalization of MP3 files. :-)

Daniel


Posted by: bonzi

Re: Audio Volume Balance - 27/02/2000 08:05

What a fascinating problem. This really cuts to the root of the issues of digital audio production. Why are recent CDs mastered so differently than older CDs? What is it about the mastering process that makes a disc sound good or bad? Proper compression is truly an art form.

Indeed. I have a live album by Pete Seeger (actually, quite a few of them, but one is afflicted by this problem more than others). It is 'peak normalised', but the peaks are the sound made by Pete's chair being pulled across the floor as he changed his position between songs. Songs themselves are almost inaudible.

But there is a problem with properly mastered albums, as well. A look at Car&Driver tells me that a car with noise level at highway cruising speed much lower than 70 dBA is a quiet car. That means we have some 50 dB between background noise level and bursting of speaker membranes / car windshield / our eardrums (while in a concert hall or a quiet listening room we have almost 100 dB). I am sure that intended difference between barely audible choir at the beginning of Tchaikowsky's 1812 and cannons later on is more than 50 dB. (There's a lot of classical pieces with great dynamic range; Dvorak's 'New World' symphony is one of my favourites I cannot listen to in the car without frequent volume adjustments.)

So, what do we do? Does anybody know of a 'compressor' for wav files? Maybe we should put real-time dynamics compression using empeg's DSP on the wish list?

Comments?

Cheers!

Dragi "Bonzi" Raos
Zagreb, Croatia
#5196
Posted by: tfabris

Re: Audio Volume Balance - 27/02/2000 10:46


Oh, and btw: CoolEdit 2000 can already do peak normalization of MP3 files.

Without decompressing and recompressing them? Jeez, why am I still using Cool Edit 96, then??!?!?!?!



Tony Fabris
Empeg #144
Posted by: tfabris

Re: Audio Volume Balance - 27/02/2000 11:14

It is 'peak normalised', but the peaks are the sound made by Pete's chair being pulled across the floor as he changed his position between songs. Songs themselves are almost inaudible.

Too classic. I love it.

Dvorak's 'New World' symphony is one of my favourites I cannot listen to in the car without frequent volume adjustments.

Your point about the car being a bad environment for listening to highly dynamic music is a good one. It makes me rethink my position on Dynamat. Originally, my opinion was that it was a waste of time and money: It's easier and cheaper to buy more powerful amps and speakers than it is to strip down your interior and apply Dynamat. Now I'm not so sure. Highly dynamic music can be a big issue in a noisy car, no matter how loud your system can crank. It's too bad that car noise supression is always such a trade-off (weight/cost vs. supression).

The funny thing is that radio stations have known about this for many years. When you listen to the radio, all that music has been run through a compressor before it's broadcast. They do it because they know that people mostly listen to the radio in their cars, and that they need to compress the dynamics for that environment.

And then there's the issue of television broadcasts. Everyone always complains because the commercials are so much louder than the TV shows. Actually, the comercials aren't louder, they're just more dynamically compressed than the shows are. Some new TV's come with a DSP that does compression/limiting to take care of that problem for you. It's funny how something we dislike about television broadcasts becomes desirable in the car.

So, what do we do? Does anybody know of a 'compressor' for wav files?

Of course. It's easy to compress the dynamics of a .WAV file before encoding it into .MP3. Cool Edit has a very nice compression interface. Although in my version (96), it's tricky to adjust the parameters correctly. There's no preset just for making a CD sound louder-- The presets assume you're working with one voice or one instrument rather than an entire album. So unless you adjust the parameters carefully, you'll get artifacts such as zipper effects and snare drums that sound gated. Like I said, good compression is an art form all by itself.

There's no way to compress the dynamics an MP3 after it's been encoded, of course...

Maybe we should put real-time dynamics compression using empeg's DSP on the wish list?

As I recall, someone already brought that up. But I've read the specs on the DSP, and if I remember correctly, it doesn't include a compressor. Any dynamics compression would have to be done realtime by the Empeg player software. So don't hold your breath.



Tony Fabris
Empeg #144
Posted by: tadzio

Re: Audio Volume Balance - 27/02/2000 12:56

Without decompressing and recompressing them? Jeez, why am I still using Cool Edit 96, then??!?!?!?!

Well, I didn't say "without decompressing and recompressing".... in fact, it includes the Fraunhofer encoder/decoder, and it decodes the file first, then performes whatever operation you want, then encodes it again. I also didn't say this was lossless...

Daniel


Posted by: bonzi

Re: Audio Volume Balance - 27/02/2000 13:50

Cool Edit has a very nice compression interface.

Ah, professionals! I meant free compression tool

Anyway, do you think Cool Edit 2000 is worth the price (for a definite amateur - I won't be making any music of my own, but might like to transfer my vinyl records)? Fraunhofer encoder, various filtering options etc. tempt me, and 'grown up' version is certainly too much for me.

Out of curiosity, do you know any site where I could find out about dynamic range compression algorithms?

Thanks!

Dragi "Bonzi" Raos
Zagreb, Croatia
#5196
Posted by: bryan

Re: Audio Volume Balance - 28/02/2000 03:25

Tweaking the mp3 files themselves sounds like a definite option, but I don't like the idea of the clipping and thus quality loss that might occur.

I would think that the gain should really be adjusted at the final amplification stage of the empeg itself.
Each file could be tagged with a +ve or -ve adjustment in whatever the granularity output stage allows. This way you could subjectively adjust each file until you get your whole collection just how you want it.
Additionally any of the suggested algorithms for finding the optimum gain could be run over the collection and used as a starting point.

This way you should only have a problem if your adjustment factor for a track pushes the total gain off the top or bottom of the scale.