From mattam at mattam.org Wed Sep 8 17:39:37 2004 From: mattam at mattam.org (Matthieu Sozeau) Date: Thu, 9 Sep 2004 02:39:37 +0200 Subject: [Ocaml-i18n] OCamlI18N-0.3 & ldml progress In-Reply-To: <20040820.224826.104048973.yoriyuki@mbg.ocn.ne.jp> References: <20040816222633.GA24159@grand> <200408172017.10838.mattam@mattam.org> <20040820.224826.104048973.yoriyuki@mbg.ocn.ne.jp> Message-ID: <200409090239.38028.mattam@mattam.org> Hello localizers, I have some good news about LDML :) I have an LDML parser working which processes all ldml-1.1 main/ data files (without collations) in 3.7 seconds, that with a memory footprint of 10 Mo, (500kb when loading just fr_FR and parents). As i'm a lazy guy I didn't want to write the parser with my little hands so I wrote a parser generator that take a dtd and outputs class types and class implementations. The parser knows just about the peculiarities of the LDML data model (inheritance and aliases) and use some dumb data structures to customize the output (attribute conversion codes, element list conversion code (e.g: to maps), factorization of classes...). I'm very happy with the result, which is flexible (the code's not that nice though). For instance i have 'lazyfied" alias resolution quickly. About LDML proper, we now have all the classes and the inheritance mechanism implemented. It is usable if you don't need number, dates formating and collation data (not so much useful yet ;). I will adapt the I18N module to the classes in some time but for now i'd like to reflect a little on the parser generator and the problems discussed in the mail following. On Friday 20 August 2004 15:48, Yamagata Yoriyuki wrote: > From: Matthieu Sozeau > Subject: [Ocaml-i18n] OCamlI18N-0.2 & ldml progress > Date: Tue, 17 Aug 2004 20:16:34 +0200 > > > > Further it would be good to have a generic mechanism to handle > > > locale dependent data, not limited to LDML. Your libaray would be a > > > natural place for it. Then Camomile can govern all localle specific > > > data through your library. > > > > I wonder what you mean by locale dependent data, is it just objects with > > different versions for each language ? > > Essntially so. However, a collation table can be quite large, so we > cannot assume they are all in the memory. I have sort of a proposal here: with a generic parser generator it would be easy to handle locale dependent data in a standard way. Take for example a message catalogue system, you pass a DTD like this to the parser generator: and it generates the corresponding class hierarchy and parsing code which handles aliases and inheritance just like LDML does. Aliases are resolved lazily and you can perfectly say that collation data should be lazily constructed too. You can also handcraft a class hierarchy integrated with LDML class types. It's not a closed solution and it should provide a standard way of dealing with locale data. > > > In additon, Camomile needs ISO language and country code. Current > > > approach is parsing locale name, but if you introduce non-standard > > > locale name, then some method for getting ISO codes is necessary. > > > > What sort of non-standard locale names do you expect ? > > Using aliases for locale names appears quite common. (For example, > catalan for ca_ES) A canonicalization function should suffice, no ? Sure we could support that by having a 'string -> locale' function in I18N.Locale that could be modified by users, is it what you're asking for ? > > ISO codes are just 15kb in total, and you don't have to load a lot of > > ldml files to support a dozen locales in a program, apart from the > > collation table creation/loading, what performance problems do you see ? > > I forget what I was thinking about, but for example Chinese locale > definitions is rather large, so we cannot ignore their memory cost and > loading time. > > Just a random thought. I think lazyness should be enough for handling this cost, but some think a cache would be needed, with weak pointers i suppose (benjamin ?). Is it really a normal usage case to load all data for a particular locale and use it only once in a run ? (In this case, you'd just have to clear the locale pool to free memory in my humble opinion). I just released OCamlI18N-0.3, it now requires pxp, and you can read the README for LDML-related installation instructions. -- mattam -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : /pipermail/ocaml-i18n/attachments/20040909/5e4ba925/attachment.pgp From yoriyuki at mbg.ocn.ne.jp Thu Sep 9 05:57:45 2004 From: yoriyuki at mbg.ocn.ne.jp (Yamagata Yoriyuki) Date: Thu, 09 Sep 2004 21:57:45 +0900 (JST) Subject: [Ocaml-i18n] [ANN] Camomile 0.6.0 Message-ID: <20040909.215745.98888043.yoriyuki@mbg.ocn.ne.jp> Camomile 0.6.0 is released. Download: http://prdownloads.sourceforge.net/camomile/camomile-0.6.0.tar.bz2 Homepage: http://camomile.sourceforge.net/ Changes are * Support Common I/O classes (http://www.ocaml-programming.de/rec/IO-Classes.html) except non-blocking I/O, which is not supported. * Remove all C binding and related functions. * Remove stdlib replacement introduced in 0.5.* * UPervasives - utf8_*_channel are removed. - normalization mode are removed. * UChar - UChar.is_printable is removed. - unsafe operations are removed. - UChar.int_of_uchar is renamed to UChar.int_of - UChar.uchar_of_int is renamed to UChar.of_int * Locale - Locale.current_locale, Locale.set_locale are removed. * CharEndocing - CharEncoding.enc_name is removed. - new classes: class CharEncoding.convert_uchar_input class CharEncoding.convert_uchar_output class CharEncoding.convert_input class CharEncoding.convert_output -- Yamagata Yoriyuki