[Project_owners] Supporting multibyte characters

David Costanzo david_costanzo at yahoo.com
Sat May 8 23:47:24 EDT 2004

--- Jaap Haitsma <jaap at haitsma.org> wrote:
> I got a request to support multi byte characters of a japanese user
> of dictionary search. Does anybody know what I should adapt in my

DictionarySearch takes a word and thunks it into a URL, right?

What HTTP server will they be using?  The reason I ask is because I
think it's the server's job to decode a multi-byte sequence into its
character set, define the word, and return the result as HTML.  And
then it should be Mozilla's job to determine that the page contains
Japaneese and render it properly.  In short, I think whatever server
you use defines how you should encode the URL.

As you probably know, you can encode arbitrary octets in a URL by
converting them to the form "%XX", where XX is the hex value of the
octet.  Section 2.1 of RFC 2396 explains how this relates to non-ASCII
character codings (basically it's currently up to a higher-level
protocol to specify the character coding).  But I'll bet your server
wants the URL encoded in UTF-8.

It might be enough for DictionarySearch to convert non-alphanumeric
characters use the octet notation.  But you may also have to convert
the string into UTF-8 (or whatever character coding the server wants).

By the way, if it turns to be too hard to add support for Japaneese to
DictionarySearch and you know of a Japaneese DICT server, send the guy
my way.  The DICT protocol has good support for defining non-ASCII
words (although my implementation may need some work).

Best Regards,

David Costanzo

Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  

More information about the Project_owners mailing list