I'm going way out on a limb here, as I have no idea how the mp3 codec really works.

However, I do see a vague similarity between this idea and VolAdj (which, if memory serves, once upon a time doubters said couldn't be done in the empeg - thank you Richard, et al).

Would it be possible for an algorithm to look so many frames forward in the stream, compare them, and if there is a mostly static waveform, drop some of the frames? I would think that would shorten longer vowels and blank spaces.

My 8 year old digital answering machine does this with blank spaces between words; how hard can it be? (ducking and running )

-jk