Okay how about this... Speech recognition is hard, but text-to-speech is easy. So, along the lines of the "spell by" thing... The operator spells the (artist | title | source | year | genre) they want to input and when the player has an exact match it speaks the exact match back to the user. No need to look at the interface.
_________________________
- Tony C
my empeg stuff