As I mentioned, I'd love to ditch this whole thing and just write something in straight-up ansi-C which would likely be best since it wouldn't rely on external programs or interpreters like some other alternatives (though I'd need to link in some libraries). I could also go with PHP but that means installing a lot of stuff on the machine I don't otherwise have a need for. But PHP does have a lot of built-in functions to make the unicode and html entities more of a breeze.

It all also means redoing this whole thing, including the xml parsing bits which thus far I've just stolen from the existing script.

I think my best bet, since the input is relatively trust-worthy, is to just filter out the characters that are not allowed and just let everything else go. This is the opposite of what we've been discussing. Seems easy enough because there are only 9 characters not allowed in a Windows file name. And most of those were already included in the awk sample I posted.

I wanted to give one last go at setting awk up for an alternate language, but no matter how much searching I've done I can't find out how to set the environment variables. Perhaps that information is so basic that no one talks about it. Lots of discussion about the variables, but I have no idea where to put them. Doesn't seem to work just stuffing them into the script.
_________________________
Bruno
Twisted Melon : Fine Mac OS Software