#11205 - 12/07/2000 19:17 Re: Speech Processing? [Re: Dignan]

FANTASTIC idea tanstaafl! Although, I imagine they're pretty far in the process to do anything about it, and who says they have to take your suggestion anyway. (but think about...please?)

Rob said ages ago:Whether it will be possible to begin a command session by simply saying "empeg" or somesuch word isn't something we'll know for a while yet. It's more likely that it will be necessary to press a button on a steering wheel remote, in line with other in-car voice systems currently on the market.

Errr.. This is actually the way that it will most likely work; and the way that some users (including myself) are hoping it doens't work:)

#11206 - 13/07/2000 02:17 Re: Speech Processing? [Re: tanstaafl.]
This thread is quite ironic - one of the major stumbling points with the software at the moment is the fact you have to press an attention button! This is quite common in in-car speech recognition, but we're trying hard to move away from this requirement, in line with the customer feedback that we've received up until now.


#11207 - 13/07/2000 05:25 Re: Speech Processing? [Re: dionysus]
Rob said ages ago

Pardon me sir for my insubrdination, but:
Date of that thread: 11/12/99 12:11 PM
Date I registered: 9/3/00 05:47 AM

I had no idea SR was being discussed that early, and I'm not about to read all ~10,000 posts on this bulletin board. I was just going by what has been discussed since SR was first mentioned in the newsletters, which was much later. In that time period I had not once seen mention of the button idea.

So rob, what's the problem with the button?



#11208 - 13/07/2000 11:50 Re: Speech Processing? [Re: Dignan]

I mean, we've all talked about how the unit would have to recognize a keyword like empeg to get it started, but what about the end of the command? It might not be clear-cut as to when your instructions end.

This shouldn't be as big of a problem as you might think. Most VR nowadays does not listen to your whole sentence and parse it out like we humans do. It might have an "attention" word, 10-20 fixed "commands", and some of those commands may have qualifiers which would also be fixed. Basically, VR typically has a very small vocabulary, stored phonetically, from which to pick the "n-best" commands/words in order of confidence-level. It can tell from the pauses after each word that that word is complete, then it scans it's vocabulary of phonemes (is that the right word? - we call them utterances) and decides which word you most likely said. If it doesn't find one with a high-enough confidence level, it either skips that word or reprompts.
What I'm getting to is this:
User: Empeg!
Mk2: (beep)
User: ShuffleOn
Mk2: (turns Shuffle ON)(beep)
User: Play...
Mk2: (now waiting for a song/playlist qualifier)
User: ...Barenaked Ladies
Mk2: (plays my BNL master playlist shuffled) (beep)

There should be a time of a few seconds now before the Empeg stops listening for commands, to allow for multiple command requests. Either that, or insert another "User: Empeg!" before the play command.

Disclaimer: I have no idea how it's actually going to work on the Mk2, but the VR packages I've used work very similarly to this, because it reduces the amount of CPU cycles/RAM required for a workable interface.

#11209 - 13/07/2000 19:03 Re: Speech Processing? [Re: Dignan]
"I had no idea SR was being discussed that early, and I'm not about to read all ~10,000 posts on this bulletin board."

Thats what the search function is for.


