Oh ah !

The simple substraction of [sound digitized from mike] - [music] = [voice]

is a bit more complex i fear :-)

-1- The first thing is [other noise] which is: [car noise] + [rest of the world noise]

So the new formula is:
[sound digitized from mike] - [music] = [voice] + [other noise]

and of course the sound of the mike depends of the mike hardware and the way you use it ( holding it close or far, or in an angle , this is not only volume, but also a different frequency response and phase shift ), so we have an unknown factor [mike distortion]

The same is for the music, which is heard as a function of your amplifier, your speakers and your car design, so we have a factor [music distortion] ...

so the new formula is something like:
---------

[sound digitized from mike] - [mike distortion]([music distortion]([music])) = [mike distortion]([voice]) + [mike distortion]([other noise])


--------
Where [a]() is meant to read as -> a as a function of b -> b is the input [a]() is the converted signal -> the output ...


There are som pretty nasty things involved, and even nastier, the way that you use the mike and the way the music is distorted by your system is going to change with the way you hold the mike and adjust your system ( new subwoofers ?? new config on the amp ?? ), so it gets really horrible ...
This is nothing to be solved by pure mathematics, the way i see it, the *first* and *very primitive* way to approach the [mike distortion] and [music distortion] would be to simply measure some values ->
Maybe in this "calibrating" process, the empeg has to play sounds in different freqs in different volume levels, so maybe play freqs of:
50 Hz, 200 Hz, 500 Hz, 2kHz, 5kHz in volume levels of -20db, -15 db, -10db, -5db, 0db ( protect ears and loudspeakers !! ) ..

So there would be a "grid" of played sounds and corresponding measured values from the mike to get a hint of those nasty distortion functions, which are without a doubt very complex differential ( correct word in englich ?? ) functions ...

So if it doesn't work, you could make the grid fine with using smaller freq steps and smaller volume steps, and don't forget to take in account the phase shifting ...



HOPEFULLY the margins and thresholds of the voice recognitions are wide enough to simply "download and play" with it, but if not, you would have to do at *least* what i roughly proposed here.
Very sad would be, if the empeg people just offer a voice recognition that only works with low empeg volume, this would be close to useless, just beeing a marketing slogan, but i trust empeg people by now :-)

Nils

Damn did i forget something important, that sounds too complicated to me ... :-(