[Ocaml-i18n] OCamlI18N-0.3 & ldml progress

Matthieu Sozeau mattam at mattam.org
Wed Sep 8 17:39:37 PDT 2004


Hello localizers,

	I have some good news about LDML :) I have an LDML parser working
which processes all ldml-1.1 main/ data files (without collations) in 3.7 
seconds, that with a memory footprint of 10 Mo, (500kb when loading just 
fr_FR and parents). As i'm a lazy guy I didn't want to write the parser with 
my little hands so I wrote a parser generator that take a dtd and outputs 
class types and class implementations. The parser knows just about the 
peculiarities of the LDML data model (inheritance and aliases) and use some 
dumb  data structures to customize the output (attribute conversion codes, 
element list conversion code (e.g: to maps), factorization of classes...). 
I'm very happy with the result, which is flexible (the code's not that nice 
though). For instance i have 'lazyfied" alias resolution quickly.

	About LDML proper, we now have all the classes and the inheritance mechanism 
implemented. It is usable if you don't need number, dates formating and 
collation data (not so much useful yet ;). I will adapt the I18N module to 
the classes in some time but for now i'd like to reflect a little on the 
parser generator and the problems discussed in the mail following.

On Friday 20 August 2004 15:48, Yamagata Yoriyuki wrote:
> From: Matthieu Sozeau <mattam at mattam.org>
> Subject: [Ocaml-i18n] OCamlI18N-0.2 & ldml progress
> Date: Tue, 17 Aug 2004 20:16:34 +0200
>
> > > Further it would be good to have a generic mechanism to handle
> > > locale dependent data, not limited to LDML.  Your libaray would be a
> > > natural place for it.  Then Camomile can govern all localle specific
> > > data through your library.
> >
> > I wonder what you mean by locale dependent data, is it just objects with
> > different versions for each language ?
>
> Essntially so.  However, a collation table can be quite large, so we
> cannot assume they are all in the memory.

I have sort of a proposal here: with a generic parser generator it would be 
easy to handle locale dependent data in a standard way. Take for example a 
message catalogue system, you pass a DTD like this to the parser generator:
<code type="toy example">
<!ELEMENT messages (alias | (message*, special*) ) >

<!ELEMENT message ( #PCDATA ) >
<!ATTLIST message key NMTOKEN #REQUIRED >
</code>
and it generates the corresponding class hierarchy and parsing code which 
handles aliases and inheritance just like LDML does. Aliases are resolved 
lazily and you can perfectly say that collation data should be lazily 
constructed too. You can also handcraft a class hierarchy integrated with 
LDML class types. It's not a closed solution and it should provide a standard 
way of dealing with locale data.

> > > In additon, Camomile needs ISO language and country code.  Current
> > > approach is parsing locale name, but if you introduce non-standard
> > > locale name, then some method for getting ISO codes is necessary.
> >
> > What sort of non-standard locale names do you expect ?
>
> Using aliases for locale names appears quite common.  (For example,
> catalan for ca_ES)

A canonicalization function should suffice, no ? Sure we could support that by 
having a 'string -> locale' function in I18N.Locale that could be modified by 
users, is it what you're asking for ?

> >  ISO codes are just 15kb in total, and you don't have to load a lot of
> > ldml files to support a dozen locales in a program, apart from the
> > collation table creation/loading, what performance problems do you see ?
>
> I forget what I was thinking about, but for example Chinese locale
> definitions is rather large, so we cannot ignore their memory cost and
> loading time.
>
> Just a random thought.

I think lazyness should be enough for handling this cost, but some think a 
cache would be needed, with weak pointers i suppose (benjamin ?). Is it 
really a normal usage case to load all data for a particular locale and use 
it only once in a run ? (In this case, you'd just have to clear the locale 
pool to free memory in my humble opinion).

I just released OCamlI18N-0.3, it now requires pxp, and you can read the 
README for LDML-related installation instructions.

-- mattam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : /pipermail/ocaml-i18n/attachments/20040909/5e4ba925/attachment.pgp 


More information about the Ocaml-i18n mailing list