[Padma] supporting more encoding

Nagarjuna Venna vnagarjuna at gmail.com
Tue Feb 21 11:58:37 EST 2006


Hi Karunakar,

There isn't a formal guide; I have been informally helping out people who
are interested in adding support for new fonts. This might be a good
opportunity to do that - here is the first cut.

In general, the complexity of adding a new font to Padma depends on how well
the font is designed. The criterion here is how many special rules need to
be written for describing shapes that can be rendered by the font compared
to how many that can be generated by using generic principles. Some of this
is script dependent - it seems like Malayalam and Tamil fonts are much
better designed than say Telugu or Devanagari. That's not always the case
though - TeluguLipi happens to be one of the cleanest designs I have seen so
far. The worst offenders are Devanagari fonts that try to use the same glyph
for the consonant stem and the vowel sign (maatra) for 'aa'.

The basic structure is as follows - the transformer is responsible for
converting a piece of text in one format into another. The transformer is
controlled by different pieces depending on the application - in the
extension, the code that manipulates the DOM tree controls the transfomer.
The transformer makes use of a parser to convert the input text into an
intermediate format. The basic job of the parser is to break input text into
syllables. The intermediate format is then converted into the desired output
format by using a lookup table.

The parser expects each encoding (ex: a font mapping) to implement an
interface. In general, we have put one encoding in one .js file and is
implemented as a JS object. The attributes to be implemented are:

1. fontFace - how the font is specified in HTML
2. displayName - currently unused (once upon a time, it was used in the UI
for heuristic transformer)
3. script - the script in which the font is typically rendered; the user can
configure from the auto transform whitelist the script in which she wants a
site to be rendered.
4. hasSuffixes - a boolean assumed to be false by default - this is set to
true for fonts for languages like Devanagari, Gujarati, Kannada etc. These
languages have complex rules for handling conjuncts that have 'ra' - for
example in arjun, the syllable 'rju' is rendered with the glyph for 'ra'
following the glyph for 'ja'.
5. maxLookupLen - tells how many code points in the input should the parser
examine before concluding that it has the right mapping. This is the length
of the longest mapping you will write. Ideally, this would be 1 - but some
fonts use as much as 4. (This is used in conjunction with isOverloaded()
API, see below).

In all the following, str is a sequence of codepoints whose length is <=
maxLookupLen.

6. lookup(str) - return the intermediate format for str in this encoding
7. isPrefixSymbol(str) - prefixes are common to all Indic scripts - for
example the Devanagari vowel sign for 'i'. This API tells the parser if str
is visually rendered before it's logical position.
8. isSuffixSymbol(str) - similar to above. Needs to be implemented only if
hasSuffixes is set to true.
9. isOverloaded(str) - if str is part of more than one lookup sequence
return true.
10. handleTwoPartVowelSigns(str1, str2) - lots of vowel signs have more than
one glyph, this API is used to handle them.

Currently, parsing is done in two phases - redundant code points are removed
and syllables are then extracted. In some cases, it may make sense to
rewrite the input string to avoid complicated special rules - in this case
preprocessMessage should be implemented:

(either of)
11. isRedundant(str) - if str doesn't add any value to the parser (for ex:
talakattu in telugu)
11. preProcessMessage(input)

Housekeeping - the transformer should be told about the new encoding by
defining a mapping from it's name to implementation in
src/content/transformers/Transformer.js. The JS file for the encoding should
go into the appropriate script folder. The JS file should be included in 2
XUL files in src/content folder - padma.xul and padmaMailOverlay.xul.

Decoding a font file - it is relatively easy to find out the mappings for a
font if you have the TTF for it. Here is a simple HTML file that will help
you in decoding - use it on a machine that already has the TTF file
installed. You will have to edit it to add the entry for your font.

<html>

<head>
<title>a</title>
<style type="text/css">
    div#abc {
        border:thin solid silver;
        padding-bottom:10px;
    }
</style>
</head>

<body>
<script type="text/javascript">

    var arr = new Array();
    arr[0x80] = 0x20AC;
    arr[0x82] = 0x201A;
    arr[0x83] = 0x0192;
    arr[0x84] = 0x201E;
    arr[0x85] = 0x2026;
    arr[0x86] = 0x2020;
    arr[0x87] = 0x2021;
    arr[0x88] = 0x02C6;
    arr[0x89] = 0x2030;
    arr[0x8A] = 0x0160;
    arr[0x8B] = 0x2039;
    arr[0x8C] = 0x0152;
    arr[0x8E] = 0x017D;
    arr[0x91] = 0x2018;
    arr[0x92] = 0x2019;
    arr[0x93] = 0x201C;
    arr[0x94] = 0x201D;
    arr[0x95] = 0x2022;
    arr[0x96] = 0x2013;
    arr[0x97] = 0x2014;
    arr[0x98] = 0x02DC;
    arr[0x99] = 0x2122;
    arr[0x9A] = 0x0161;
    arr[0x9B] = 0x203A;
    arr[0x9C] = 0x0153;
    arr[0x9E] = 0x017E;
    arr[0x9F] = 0x0178;

    function a()
    {
        var f1, f2 = "</font>", tr1 = "<tr>", tr2 = "</tr>", td1= "<td
align='center' width=\"12\">",
            td2 = "</td>", hr = "<hr>";
        var output = "<table>";

        if (self.document.inputForm.source.value == "Eenadu")
            f1 = "<font size=\"12\" face=\"Eenadu\" color=\"#0000FF\">";
        else if (self.document.inputForm.source.value == "AndhraJyothi")
            f1 = "<font face=\"SHREE-TEL-0900\" size=\"12\"
color=\"#0000FF\">";

        for(var i = 32; i <= 255; ++i) {
            var code = arr[i] != null ? arr[i] : i;
            output += tr1 + td1 + i + td2 + td1 + String.fromCharCode(i) +
td2 + td1 +
                      'Char:'+ f1 + String.fromCharCode(code) + f2 + td2 +
tr2;
        }

        output += "</table>";
        document.getElementById("abc").innerHTML = output;
    }
</SCRIPT>

<FORM name="inputForm">
    <SELECT name ="source">
        <OPTION value = "Eenadu"> Eenadu </OPTION>
        <OPTION value = "AndhraJyothi"> Andhra Jyothi </OPTION>
    </SELECT>

    <INPUT type=button name="" value="Decode" onClick="a()">
</FORM>

<div id="abc">
</div>

</BODY>
</HTML>


Let me know if this helps.

Thanks,
nagarjuna

On 2/21/06, Guntupalli Karunakar <karunakar at randomink.org> wrote:
>
> Hi,
> I have been using Padma for a while & recommending it to many ppl.
> Its one great tool which solves the problems of font support faced
> when browsing indic websites.
> while currently quite a few encodings are supported, I was looking
> towards adding support for Shusha & some other fonts. is there a
> quick guide to how to add more encodings or websites?
> meanwhile i am trying to figure that out from the source code.
>
> Regards,
> Karunakar
>
> --
>
> *************************************
> * Work: http://www.indlinux.org     *
> * Blog: http://cartoonsoft.com/blog *
> *************************************
> _______________________________________________
> Padma mailing list
> Padma at mozdev.org
> http://mozdev.org/mailman/listinfo/padma
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mozdev.org/pipermail/padma/attachments/20060221/aeb712c5/attachment.htm


More information about the Padma mailing list