[Padma] [ilugd] Indian language character conversion
Nagarjuna Venna
vnagarjuna at gmail.com
Tue Nov 7 18:38:18 PST 2006
Some feedback:
1. You use the word 'language' when you really mean 'script' in many
places (there is no Devanagari language). The distinction should be
clear.
2. Same goes with characters. Ex: In Section C, you say "Thus,
characters are...". That word is very ambiguous here.
3. I see problems with the six categories defined in the doc:
a. It is not clear to me where a syllable like 'shri' falls. It seems
that the 3 categories you have - consonant, conjunct and letter are
exactly the same - they are all syllables. You probably should call
them out as syllable and add an extra attribute identifying them as
vowels, consonants and conjuncts.
b. You should use the word 'vattu' for half-post. It is used in Open
Type fonts, Unicode specs etc.
c. You should also use the phrase 'half form' for half_pre. This is
again, a well known and well understood term.
d. The definition of split_matra makes no sense to me. Matras can go
not only to the left or right, they can also go above and below the
syllable (half above, half below).
e. In many scripts, a given shape (mostly vattus) can have a post-base
form, a pre-base form and a below-base form. Any given font can
support more than one of these forms and a lot of fonts do. You need
attributes to describe these.
f. Many fonts split matras even if they do not have two distinct
parts. You should be able to accommodate that.
4. D(4) - See the implementation of Padma to avoid having to sort by
the maximum length of the key. Most of the time one key sequence is
enough.
I don't have the time to discuss the rest of the document, but I don't
see a clear definition of the parser. The only way to parse Indic
scripts is to extract syllables from the input and operate on them.
What this means is you go over a set of input characters until you can
make a syllable out of them. You can look at many documents out there
that describe what makes a syllable in Indic scripts (syllables that
are made only of vowels, syllables that have a series of consonants
followed by a vowel sign followed by vowel modifiers etc.)
One suggestion that I can give you is that you go over the Padma
source code and look at the design. Almost everything that you are
trying to do has been done there and works for pretty much every Indic
script.
Thanks,
nagarjuna
On 11/7/06, vivek khurana <khuranavivek_in at yahoo.com> wrote:
>
> hi!
>
> IMHO this appears to be a requirement specification
> than a design document.
>
> regards
> VK
> --- Gora Mohanty <gora at sarai.net> wrote:
>
> > Hi,
> > My apologies to people who are getting this
> > multiple times. I have
> > discussed the issue of a generalised approach to
> > Indian language
> > character conversion with several people, and have
> > been working on it
> > on and off for several months now. This would cover
> > areas like keymaps,
> > font converters, and transliterators.
> >
> > I have finally managed to write up a design
> > document for the software,
> > an event precipitated by the need for Sarai
> > CyberMohalla to convert
> > large amounts of Unicode text to a 8-bit font. A
> > preliminary version of
> > the document is on a Wiki page at
> > http://cmwiki.sarai.net/index.php/FontConversion ,
> > and I will soon be
> > preparing a more detailed write-up. At this point, I
> > am mainly looking
> > for technical comments on the software design
> > document at
> > http://cmwiki.sarai.net/index.php/PratilipiDesign ,
> > though other
> > comments are also appreciated. I would also
> > appreciate comments on if
> > the design is deficient for non-Sanskrit-based
> > languages. If you see
> > major flaws in the design, please let me know soon
> > as I am starting the
> > coding.
> >
> > Regards,
> > Gora
> >
> >
> > _______________________________________________
> > ilugd mailinglist -- ilugd at lists.linux-delhi.org
> > http://frodo.hserus.net/mailman/listinfo/ilugd
> > Archives at:
> > http://news.gmane.org/gmane.user-groups.linux.delhi
> >
> http://www.mail-archive.com/ilugd@lists.linux-delhi.org/
> >
>
>
> Engineers normally have problem with every solution. If not they have a solution in search of a problem.
>
>
> http://creative.linux-delhi.org
>
> Disclaimer
> The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw...
>
>
>
>
> ____________________________________________________________________________________
> Do you Yahoo!?
> Everyone is raving about the all-new Yahoo! Mail.
> http://new.mail.yahoo.com
>
More information about the Padma
mailing list