I mean, we've all talked about how the unit would have to recognize a keyword like empeg to get it started, but what about the end of the command? It might not be clear-cut as to when your instructions end.

This shouldn't be as big of a problem as you might think. Most VR nowadays does not listen to your whole sentence and parse it out like we humans do. It might have an "attention" word, 10-20 fixed "commands", and some of those commands may have qualifiers which would also be fixed. Basically, VR typically has a very small vocabulary, stored phonetically, from which to pick the "n-best" commands/words in order of confidence-level. It can tell from the pauses after each word that that word is complete, then it scans it's vocabulary of phonemes (is that the right word? - we call them utterances) and decides which word you most likely said. If it doesn't find one with a high-enough confidence level, it either skips that word or reprompts.
What I'm getting to is this:
User: Empeg!
Mk2: (beep)
User: ShuffleOn
Mk2: (turns Shuffle ON)(beep)
User: Play...
Mk2: (now waiting for a song/playlist qualifier)
User: ...Barenaked Ladies
Mk2: (plays my BNL master playlist shuffled) (beep)

There should be a time of a few seconds now before the Empeg stops listening for commands, to allow for multiple command requests. Either that, or insert another "User: Empeg!" before the play command.

Disclaimer: I have no idea how it's actually going to work on the Mk2, but the VR packages I've used work very similarly to this, because it reduces the amount of CPU cycles/RAM required for a workable interface.

_~= Dearing =~_
"WAY too happy about having #99."
_________________________
_~= Dearing =~_
Gettin' back into it thanks to slimrio!