Well, it's all working as well as it needs to be without changing the environment variables.

Patching cygwin for UTF-8 and saving the scripts as UTF-8 NO BOM allowed me to use UTF-8 characters. The xmlstarlet seems to pick up certain HTML entities and converts them automatically too. & already comes back as &.

By using gawk to replace characters, including those that are invalid in filenames, I'm left with a string that will work for what I need (file names, folder names and maintaining full legibility).

If this were something I was going to distribute then I'd go about it differently. Or if it was something I needed to host remotely, I'd definitely do it in PHP. I did find a nice PHP function that parses an XML document into a nice (and potentially large) array.

Thanks a lot for your help guys! I don't think I would have been able to get through all this without it.
_________________________
Bruno
Twisted Melon : Fine Mac OS Software