I've tried gsub("\u2019","'") for the curved single quote (number is the unicode hex value) but no success. I read about that on some page but my version of awk reports it's treated as a normal u, and not an escape for a unicode hex string.

I have yet to try \x and specify individual bytes.

I've verified that [:alnum:] is working for the characters that were caught by a-zA-Z0-9. But it's not working for accented characters.

I don't know if it's because the text is butchered before getting to awk or what. But awk is likely not seeing the unicode portions as multiple proper ascii characters, otherwise I'd have extra characters passing through instead of the accented ones simply being dropped. Arrgh.
_________________________
Bruno
Twisted Melon : Fine Mac OS Software