Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Page 1 of 2 1 2 >
Topic Options
#89174 - 18/04/2002 22:23 Text to Speech on the Empeg - Let's do it!
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
Okay folks, this thread got me thinking about text to speech again. It's a huge wish of mine to get practical text to speech working on the Empeg. The applications are numerous, from having it read driving directions, to voice prompting in the player app, to more frivolous applications like having my Trivia game read you the questions (my original reason for looking into this.)

Here are my findings so far:

1. The *only* text-to-speech engine we even need to be thinking about is flite. It's probably the only one small enough to run on the Empeg, and it does a very good job. The version 1.1 binary release comes with a 16 KHz voice that sounds pretty damn good. Flite is 90% of the puzzle, and it's open source and free.

2. The other 10% of the puzzle comes from the fact that Flite can't write to the Empeg's sound device due to its peculiar buffer and sample rate requirements. This means that the raw sound data produced by Flite needs to be sampled up to 44.1 KHz and then written to the sound device the way the Empeg expects it to be written (4608 bytes at a time.) These limitations aside, flite is amazing. Running on the Empeg (but without the player app running) the flite engine did text-to-speech of the first paragraph of the GNU public license in less than 4 seconds (the paragraph when spoken is at least 20 seconds long.)

3. So tonight, inspired by the aforementioned PhatBox thread, I dug deeper on the web and finally found some sample rate conversion software that will do what we need it to do pretty easily. It takes WAV on stdin and writes it on stdout after some sample rate conversion that seems to be both high quality and pretty fast.

4. Using the above programs (flite and rateconv) I can generate a WAV file that the trusty ol' pcmplay example program can play to the Empeg's sound device. So we have this chain working:

source text --> flite --> wav file ---> rateconv --> pcmplay --> empeg sound output

.
In the UNIX shell, it looks something like this:


flite16k "Pink Floyd. Another Brick in The Wall Part II. 1979" -o test.wav;
rateconv -m 16000 7200 400 65 5 1 0.8 < test.wav | pcmplay

.

When I run this and the player isn't running, the whole process is very quick. Maybe two seconds, which is shorter in duration than the sound output itself. However, it's far from real time due to the fact that it's running the flite engine, writing to a file, then reading that file into the sample rate converter, which passes its stdout into the stdin of pcmplay.

That's not ideal, of course. The Holy Grail is to modify Flite source code to use the Empeg's sound device directly, and grafting in the sample rate conversion code from rateconv so that the output doesn't sound like a 33 RPM record going at 78 RPM. The other modification would be running this modified Flite with the realtime round robin scheduler so that it can play nicely while the player app is running.

I was hoping to make all this happen, but I don't think I'm the guy to do it. With sufficient hacking and thrashing about, I can probably do it. But I know there are people out there who are reading this who are better equipped to take this on. If you're one of those people, please raise your hand. Or if you have anything else to say on this topic, let's hear it. Quasi-realtime TTS on the Empeg would be useful for dozens of applications, and all the software we need is already there. We just need to integrate it all and make it one "speech server" that other user apps can connect to.

So who's in?
_________________________
- Tony C
my empeg stuff

Top
#89175 - 18/04/2002 22:39 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
Terminator
old hand

Registered: 12/01/2000
Posts: 1079
Loc: Dallas, TX
Doesnt kim salo's gps project use voice prompts?

Top
#89176 - 18/04/2002 22:45 Re: Text to Speech on the Empeg - Let's do it! [Re: Terminator]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
Yup, but they're prerecorded, not realtime, on-the-fly. I want realtime TTS and all the ingredients are there.
_________________________
- Tony C
my empeg stuff

Top
#89177 - 19/04/2002 00:12 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
That would be so cool... Just add it to the ever growing list of things I want to have a crack at.

Exactly what were you going to use the TTS for? I can see song titles/album names etc but what else?
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#89178 - 19/04/2002 00:32 Re: Text to Speech on the Empeg - Let's do it! [Re: Shonky]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
When you say speech server, that could be implemented in a device ala /proc/kernel which lets you upgrade the kernel. You could have /proc/tts and then to say something you would/could simply go:

echo "This is the empeg talking" >/proc/tts

or the equivalent in the userland program. So this might be possible for a hijack thing.
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#89179 - 19/04/2002 04:24 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
rob
carpal tunnel

Registered: 21/05/1999
Posts: 5335
Loc: Cambridge UK
The problem with flite is that it sounds pretty awlful, and the car environment is one place you need good quality speech. There are a few very good quality commercial systems, but the problem with them (apart from being commercial) is their large footprint.

Of course, the engine itself can be replaced later in the project if something better comes along. It'll be good to get the basic infrastructure in place with what's available now.

Rob

Top
#89180 - 19/04/2002 04:35 Re: Text to Speech on the Empeg - Let's do it! [Re: rob]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
Bummer. I hadn't got to try it yet. I assumed it sounded OK from what Tony was saying. Is it the standard very robotic computer voice?
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#89181 - 19/04/2002 05:09 Re: Text to Speech on the Empeg - Let's do it! [Re: rob]
thenominous
member

Registered: 22/12/2001
Posts: 189
Loc: UK
I assume that festival falls into the "too large " category, or possibly the too much CPU time required?

Top
#89182 - 19/04/2002 05:16 Re: Text to Speech on the Empeg - Let's do it! [Re: Shonky]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
that could be implemented in a device ala /proc/kernel which lets you upgrade the kernel

Exactly what I had in mind.
_________________________
- Tony C
my empeg stuff

Top
#89183 - 19/04/2002 05:34 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
Does it really sound that bad yn0t?
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#89184 - 19/04/2002 05:35 Re: Text to Speech on the Empeg - Let's do it! [Re: rob]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
The problem with flite is that it sounds pretty awlful

Well I'm not sure if you've used v1.1 yet, but it comes with a 16 KHz voice that isn't all that bad. It's not perfect, in fact, it's definitely stripped down from Festival, and it's not going to compete with something like AT&T's Natural Voices or anything, but *it works* and it works right now, and it's getting better.

I wrote the author a while back and he does have a commercial version of flite available via his website http://www.cepstral.com/. But for me, flite does an admirable job considering it's running on the Empeg in real time.

I'm attaching a sample mp3 file for people to make their own judgements. No, it doesn't sound like a real person talking, and yes, the commercial systems are advanced, but I'm not imagining anyone getting them working on the Empeg anytime soon. I perosnally think the TTS provided by flite would be perfectly acceptable for voice prompts reading you the artist/title, playing GPS directions, etc. But that's just my opinion.



Attachments
87316-flitetest.mp3 (129 downloads)

_________________________
- Tony C
my empeg stuff

Top
#89185 - 19/04/2002 05:41 Re: Text to Speech on the Empeg - Let's do it! [Re: Shonky]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
Bummer. I hadn't got to try it yet. I assumed it sounded OK from what Tony was saying. Is it the standard very robotic computer voice?

I would say it's three steps above my answering machine's voice, but certainly not as advanced as other TTS systems I've tried on my PC. I'm not sure if this is a limitation, or if they just don't want to ship flite with a very high quality voice so they can sell their commercial product instead, but it sure does sound robotic.

However, it does a very good job at analyzing sentence structure, inserting pauses, etc. In the example MP3, I didn't have to insert any words phonetically to make it sound right, or tell it where to pause in sentences, etc... I think it would do a good enough job that it would be useful in a variety of applications.
_________________________
- Tony C
my empeg stuff

Top
#89186 - 19/04/2002 05:44 Re: Text to Speech on the Empeg - Let's do it! [Re: thenominous]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
I assume that festival falls into the "too large " category, or possibly the too much CPU time required?

Ohhhhhhhhhhh yeah. I took a stab at Festival but it doesn't even come CLOSE to running in the Empeg's memory footprint. So CPU isn't even an issue.

Festival is the Emacs of TTS systems. It uses a bunch of LISP files to define its various diphone and speech engines... It's amazingly configurable and tweakable, but massive and bulky. The author saw this as an opportunity to make the same kind of thing work on smaller platforms, and that's Flite. LISP is replaced by tight C code. I think it's a great effort.
_________________________
- Tony C
my empeg stuff

Top
#89187 - 19/04/2002 05:58 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
I assume Flite is Festival Lite then? The demo doesn't sound fantastic. The demos on www.cepstral.com sound really good though in my opinion. Pity it's not free....
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#89188 - 19/04/2002 06:43 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
genixia
Carpal Tunnel

Registered: 08/02/2002
Posts: 3411
Hey great, we can get Stephen Hawking on our empegs
_________________________
Mk2a 60GB Blue. Serial 030102962 sig.mp3: File Format not Valid.

Top
#89189 - 19/04/2002 08:15 Re: Text to Speech on the Empeg - Let's do it! [Re: Shonky]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
I didn't say it sounded great, I said it works.

Fine, guys, wait for perfectly human-sounding TTS to suddenly appear and be able to run in the Empeg's limited resources.

Incidentally, awhile back I spoke with Alan Black, author of Flite. He seemed very interested in selling me Cepstral. I didn't follow up on it because I had other things to do at the time. The thing is I don't think Cepstral has source code available, which would mean we'd have to work out a way to modify it (or have them add support for the Empeg's soundcard and up-sampling the sound.)

If anyone really wants Cepstral I can try to resume dialog with him. The thing is I think we should start working with what we have. Too often around here people want the Mercedes without trying things out on the Chevy Impala first.
_________________________
- Tony C
my empeg stuff

Top
#89190 - 19/04/2002 08:16 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31565
Loc: Seattle, WA
Yeah, but won't you be infringing on a patent now if you do this?
_________________________
Tony Fabris

Top
#89191 - 19/04/2002 08:18 Re: Text to Speech on the Empeg - Let's do it! [Re: tfabris]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
Yeah, but won't you be infringing on a patent now if you do this?

I think I'll get over the mental anguish.
_________________________
- Tony C
my empeg stuff

Top
#89192 - 19/04/2002 10:47 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
rob
carpal tunnel

Registered: 21/05/1999
Posts: 5335
Loc: Cambridge UK
Fine, guys, wait for perfectly human-sounding TTS to suddenly appear and be able to run in the Empeg's limited resources.

That'll be sometime around Q3 then.. but I suspect it'll take a bit longer for a free package to meet those requirements.

Rob

Top
#89193 - 19/04/2002 11:08 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
jwtadmin
enthusiast

Registered: 05/09/2000
Posts: 210
Loc: Ipswich, MA
I think that this is a great idea. and I would love to see it implemented. This would be great for announcing tracks, say for example if your empeg was in a convertable and you couldn't read the display.

Or definately for the trivia game, which I am eagerly awaiting.

The voice is shaky but as good as my TI99/4a and if a better voice comes out then we can upgrade.
_________________________
___ John Turner "It's easier to ask for forgiveness than to ask for permission"

Top
#89194 - 19/04/2002 12:57 Re: Text to Speech on the Empeg - Let's do it! [Re: tonyc]
snoopstah
enthusiast

Registered: 07/01/2002
Posts: 337
Loc: Squamish, BC
Mmmm... it probably sounds better if you knew what it was meant to say!

But it could be pretty good - ideally, you'd have an option in emplode or similar to type in an alternative album/track name for it to say, if it messed up something really big - and otherwise it would just say the stored name.

Cheers,

A.
_________________________
Empeg Mk2a 128G with amber lit buttons kit - #30102490

PhotoVancouver | Squamish, BC Webcam | Personal Website

Top
#89195 - 19/04/2002 14:51 Re: Text to Speech on the Empeg - Let's do it! [Re: Shonky]
grgcombs
addict

Registered: 03/07/2001
Posts: 663
Loc: Dallas, TX
No, not really. The key is it works, now. It works well enough for our early adopter purposes. It's on par with Apple's Text To Speech in about 1998. And takes up nothing of the footprint like Apple's did.

Greg
_________________________

Top
#89196 - 19/04/2002 15:28 Re: Text to Speech on the Empeg - Let's do it! [Re: grgcombs]
TommyE
enthusiast

Registered: 08/06/1999
Posts: 356
Loc: NORWAY
Or Amiga in early 1986....


TommyE

Top
#89197 - 19/04/2002 15:46 Re: Text to Speech on the Empeg - Let's do it! [Re: jwtadmin]
rob
carpal tunnel

Registered: 21/05/1999
Posts: 5335
Loc: Cambridge UK
The problem comes with proper nouns - have it speak a selection of artist names and album titles. We've tested this a lot - even the best commercial solution is only just acceptable. This is OK if you know what it's meant to be saying, but for eyes up navigation it's a bit of a pain.

This area of technology does seem to be advancing more rapidly now than ever before (both in the open source and commercial worlds) - which is nice.

Rob

Top
#89198 - 19/04/2002 15:52 Re: Text to Speech on the Empeg - Let's do it! [Re: TommyE]
ninti
old hand

Registered: 28/12/2001
Posts: 868
Loc: Los Angeles
I was thinking older still, it is amazing that it really sounds little better then a 2 mhz computers can make 20 years ago.

"H'ello, my name is Sam, I am a speech synthesizer for the Ataaari home compuuuter"

20 years....Man, I am getting old.
_________________________
Ninti - MK IIa 60GB Smoke, 30GB, 10GB

Top
#89199 - 19/04/2002 15:58 Re: Text to Speech on the Empeg - Let's do it! [Re: ninti]
rob
carpal tunnel

Registered: 21/05/1999
Posts: 5335
Loc: Cambridge UK
Those old speech synths were mostly getting fed phonetic data, which is an altogether easier proposition than true TTS.

Rob

Top
#89200 - 19/04/2002 16:08 Re: Text to Speech on the Empeg - Let's do it! [Re: rob]
ninti
old hand

Registered: 28/12/2001
Posts: 868
Loc: Los Angeles
True enough, they did sound better with phoentic spelling, but Sam would read ordinary typed speech as well, and it wasn't that much worse...i.e., it got most of the major curse words correctly, which is of course worth literally hours of amusement to junior high kids.
_________________________
Ninti - MK IIa 60GB Smoke, 30GB, 10GB

Top
#89201 - 19/04/2002 16:18 Re: Text to Speech on the Empeg - Let's do it! [Re: ninti]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31565
Loc: Seattle, WA
it got most of the major curse words correctly, which is course worth literally hours of amusement to junior high kids.

And their parents, too.

I'm not kidding. One day my friend and I came home from school to find his dad sitting at the C-64 typing text into SAM:

"<friend's dad's supervisor's name> IS A FUCKING ASS HOLE"
"<friend's dad's supervisor's name> IS A DIP SHIT"
(etc.)

Seems he'd had a bad day at work...
_________________________
Tony Fabris

Top
#89202 - 19/04/2002 17:53 Re: Text to Speech on the Empeg - Let's do it! [Re: tfabris]
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
That litereally made me gut laugh... pictureing an older man sitting at a C-64... LOL. nice.
_________________________
|| loren ||

Top
#89203 - 20/04/2002 10:06 Re: Text to Speech on the Empeg - Let's do it! [Re: loren]
dcosta
enthusiast

Registered: 04/02/2002
Posts: 277
Loc: Massachussetts
I would like to see something where I could specify a sound file to play for each menu item.
You could use all kinds of neat custom sounds for bands...
and maybe even sound clips of the songs for each song....
that would be super neat....
_________________________
__________ davecosta Hijacked 60GB MKIIa 2.0b13

Top
Page 1 of 2 1 2 >