[Project_owners] Detecting file charset

Zachary Carter zack.carter at gmail.com
Fri Feb 16 08:53:26 PST 2007


On 2/14/07, Karsten Düsterloh <mnenhy at tprac.de> wrote:
> Well, how should this work?
> If all characters are below 0x80, it's most probably(!) ASCII, and else?
Can Javascript do this type of comparison? I'd assume you'd have to
compare each char to do so... but still this wouldn't be reliable.

> What makes 0xA4 be a euro sign instead of mere currency symbol?

I'm not sure if these are rhetorical questions or not. :(

I have found some interesting things though:
Universal Encoding Detector written in Python
http://chardet.feedparser.org/docs/how-it-works.html

which linked to this article,
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

Haven't read them yet, but seems promising. I know Mozilla has some
detection built in.. but no scriptable interfaces, it would seem...


More information about the Project_owners mailing list