From ben at socialtools.net Sun May 16 13:18:58 2004 From: ben at socialtools.net (Benjamin Geer) Date: Sun, 16 May 2004 21:18:58 +0100 Subject: [Ocaml-i18n] ANN: LocalCaml 0.1.0 Message-ID: <40A7CCB2.50104@socialtools.net> This is to announce a very preliminary implementation of a message catalog system for localising text in OCaml programs; it's called LocalCaml. The approach I've taken is partly inspired by Perl's Locale::Maketext module[1]. In an attempt to provide the flexibility needed to adapt sentence structure and morphology to numeric parameters in messages, the library uses a general-purpose template engine, CamlTemplate[2] as a language for writing message templates. This isn't a proper release yet; I'm posting this message in order to ask for feedback from the OCaml community. If you're interested in i18n in OCaml, please have a look and let me know what you think. You can get the tarball from here: http://saucecode.org/localcaml/releases/localcaml-0.1.0.tar.gz Anonymous CVS access is also available: touch ~/.cvspass cvs -d :pserver:anonymous at cvs.saucecode.org:/cvsroot login cvs -z3 -d :pserver:anonymous at cvs.saucecode.org:/cvsroot co localcaml Apologies for cross-posting. Please post followups to the ocaml-i18n mailing list[3]. Ben [1] http://search.cpan.org/~sburke/Locale-Maketext-1.09/lib/Locale/Maketext/TPJ13.pod [2] http://saucecode.org/camltemplate/ [3] http://www.orcaware.com/mailman/listinfo/ocaml-i18n From yoriyuki at mbg.ocn.ne.jp Thu May 20 14:49:20 2004 From: yoriyuki at mbg.ocn.ne.jp (Yamagata Yoriyuki) Date: Fri, 21 May 2004 06:49:20 +0900 (JST) Subject: [Ocaml-i18n] [ANN] Camomile 0.5.2 Message-ID: <20040521.064920.85423701.yoriyuki@mbg.ocn.ne.jp> Camomile 0.5.2 is released. This is a bug fix release. Camomile is a comprehensive Unicode library for OCaml. Camomile provides Unicode character type, UTF-8, UTF-16, UTF-32 strings, conversion to/from about 200 encodings, collation and locale-sensitive case mappings, and more. The library is currently designed for Unicode Standard 3.2. Download: http://prdownloads.sourceforge.net/camomile/camomile-0.5.2.tar.bz2 Changes: http://camomile.sourceforge.net/Changes.txt Homepage: http://camomile.sourceforge.net -- Yamagata Yoriyuki From ben at socialtools.net Fri May 21 15:47:36 2004 From: ben at socialtools.net (Benjamin Geer) Date: Fri, 21 May 2004 23:47:36 +0100 Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n Message-ID: <40AE8708.4020504@socialtools.net> I've been looking at the locale support in Camomile, and at Matthieu Sozeau's OCamlI18n library, and I have a few comments and questions. About Camomile: Yoriyuki, what are your plans for the use of locales in Camomile? Where did those IBM locale files come from, and what software knows how to parse and use them fully? Maintaining locale data in files rather than in source code seems sensible to me. I would be interested in working on a project to handle locales this way in Caml. About OCamlI18n: Many locales use different characters for digits. In Arabic, either Arabic or Hindi digits may be used. So there needs to be a way for a locale-specific number formatter to choose the right kind of digits, according to the locale but also according to the user's preference; this of course applies to dates as well. I'm trying to understand how the 'calendar' structures are meant to be used; nothing in the library seems to use them. If the goal is to make it possible for people to plug in their own calendar implementations, why not make Calendar a class type rather than a module? Ben From yoriyuki at mbg.ocn.ne.jp Fri May 21 16:27:24 2004 From: yoriyuki at mbg.ocn.ne.jp (Yamagata Yoriyuki) Date: Sat, 22 May 2004 08:27:24 +0900 (JST) Subject: [Ocaml-i18n] Re: locales in Camomile and OCamlI18n In-Reply-To: <40AE8708.4020504@socialtools.net> References: <40AE8708.4020504@socialtools.net> Message-ID: <20040522.082724.48536632.yoriyuki@mbg.ocn.ne.jp> From: Benjamin Geer Subject: locales in Camomile and OCamlI18n Date: Fri, 21 May 2004 23:47:36 +0100 > Yoriyuki, what are your plans for the use of locales in Camomile? Where > did those IBM locale files come from, and what software knows how to > parse and use them fully? Maintaining locale data in files rather than > in source code seems sensible to me. I would be interested in working > on a project to handle locales this way in Caml. Data come from ICU (http://oss.software.ibm.com/icu/). However, I would advise not to use them, since they are designed for a particular C parser and do not have an even syntax. Later, IBM developed XML version of them, which become a foundation of Locale Data Markup Language (LDML). You can find DTD and all in http://www.openi18n.org/specs/ldml/ and the data repository http://www.openi18n.org/subgroups/lade/locale/. I wanted to migrate LDML, but I did not have enough time (and energy). If you make a library to access them, then Camomile will use it. Whatever you choose for locale data, you may be interested in Camomile's Locale module, since it implements a general method for loading locale data with fallback mechanism. -- Yamagata Yoriyuki From ben at socialtools.net Sat May 22 03:19:26 2004 From: ben at socialtools.net (Benjamin Geer) Date: Sat, 22 May 2004 11:19:26 +0100 Subject: [Ocaml-i18n] Re: locales in Camomile and OCamlI18n In-Reply-To: <20040522.082724.48536632.yoriyuki@mbg.ocn.ne.jp> References: <40AE8708.4020504@socialtools.net> <20040522.082724.48536632.yoriyuki@mbg.ocn.ne.jp> Message-ID: <40AF292E.9010409@socialtools.net> Yamagata Yoriyuki wrote: > I wanted to migrate LDML, but I did not have enough time (and > energy). If you make a library to access them, then Camomile will use > it. This looks ike a good idea. I'll study it some more and try to at least make a start on it. > Whatever you choose for locale data, you may be interested in > Camomile's Locale module, since it implements a general method for > loading locale data with fallback mechanism. I'll be sure to use Camomile's Locale module. Ben From mattam at altern.org Sat May 22 05:32:14 2004 From: mattam at altern.org (Matthieu Sozeau) Date: Sat, 22 May 2004 14:32:14 +0200 Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <40AE8708.4020504@socialtools.net> References: <40AE8708.4020504@socialtools.net> Message-ID: <200405221432.14746.mattam@altern.org> Hi, On Saturday 22 May 2004 00:47, Benjamin Geer wrote: > About OCamlI18n: > > Many locales use different characters for digits. In Arabic, either > Arabic or Hindi digits may be used. So there needs to be a way for a > locale-specific number formatter to choose the right kind of digits, > according to the locale but also according to the user's preference; > this of course applies to dates as well. > > I'm trying to understand how the 'calendar' structures are meant to be > used; nothing in the library seems to use them. If the goal is to make > it possible for people to plug in their own calendar implementations, > why not make Calendar a class type rather than a module? I chose to have a module type (Calendar) and also a record type ('a calendar) to allow polymorphic programming on dates, but you're right in saying that a class type would be more appropriate. Calendars are used in Date formatting functions obviously, but, as you point out, the current design does not allow for other indications than the locale (you could still have 2 local variants for Arabic representing the different digits conventions). I just updated the documentation on my website to reflect the current code, which uses a functor from a format definition (which in turn is a functor of a calendar) and can be used like this: module DateFormat = I18N.Format(I18N.DateFormatDefinition(I18N.Date.Gregorian)) let fmt = DateFormat.parse_format ~locale src_pattern in DateFormat.parse ~locale fmt src The goal is to allow easy definition of formatters for particular calendars (and also for numbers, although code is currently commented in i18N.ml). Taking your example, we could have decimal and roman Number modules (or classes), with a format definition for decimals with different behavior for arabic and arabic_hindi. Changing the design to object-style is needed for ldml anyway, so if you are interested in working on it, i'll be happy to discuss a rework of ocamli18n. -- Do not underestimate the value of print statements for debugging. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : /pipermail/ocaml-i18n/attachments/20040522/c11747f3/attachment.pgp From ben at socialtools.net Sun May 23 02:30:31 2004 From: ben at socialtools.net (Benjamin Geer) Date: Sun, 23 May 2004 10:30:31 +0100 Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <200405221432.14746.mattam@altern.org> References: <40AE8708.4020504@socialtools.net> <200405221432.14746.mattam@altern.org> Message-ID: <40B06F37.3040905@socialtools.net> Matthieu Sozeau wrote: > Changing the design to object-style is needed for ldml anyway, so if you are > interested in working on it, i'll be happy to discuss a rework of ocamli18n. How about if I start by writing something that can load the LDML locale data and represent it as Caml class types and other idiomatic Caml types? Ben From mattam at altern.org Sun May 23 04:58:33 2004 From: mattam at altern.org (Matthieu Sozeau) Date: Sun, 23 May 2004 13:58:33 +0200 Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <40B06F37.3040905@socialtools.net> References: <40AE8708.4020504@socialtools.net> <200405221432.14746.mattam@altern.org> <40B06F37.3040905@socialtools.net> Message-ID: <200405231358.48345.mattam@altern.org> On Sunday 23 May 2004 11:30, Benjamin Geer wrote: > Matthieu Sozeau wrote: > > Changing the design to object-style is needed for ldml anyway, so if you > > are interested in working on it, i'll be happy to discuss a rework of > > ocamli18n. > > How about if I start by writing something that can load the LDML locale > data and represent it as Caml class types and other idiomatic Caml types? That would be really nice! One little question though: what parser will you use ? I suppose Yamagata should have a say on that if it gets integrated in Camomile. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : /pipermail/ocaml-i18n/attachments/20040523/a7d57547/attachment.pgp From ben at socialtools.net Sun May 23 06:09:26 2004 From: ben at socialtools.net (Benjamin Geer) Date: Sun, 23 May 2004 14:09:26 +0100 Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <200405231358.48345.mattam@altern.org> References: <40AE8708.4020504@socialtools.net> <200405221432.14746.mattam@altern.org> <40B06F37.3040905@socialtools.net> <200405231358.48345.mattam@altern.org> Message-ID: <40B0A286.3070303@socialtools.net> Matthieu Sozeau wrote: > That would be really nice! One little question though: what parser will you > use ? I suppose Yamagata should have a say on that if it gets integrated in > Camomile. I was going to use PXP, since it has good DTD and Unicode support, is implemented entirely in Caml and seems very stable, and also because I've also used it in LocalCaml, which needs a validating parser. Would that be OK with everyone? Ben From yoriyuki at mbg.ocn.ne.jp Sun May 23 16:53:48 2004 From: yoriyuki at mbg.ocn.ne.jp (Yamagata Yoriyuki) Date: Mon, 24 May 2004 08:53:48 +0900 (JST) Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <40B0A286.3070303@socialtools.net> References: <40B06F37.3040905@socialtools.net> <200405231358.48345.mattam@altern.org> <40B0A286.3070303@socialtools.net> Message-ID: <20040524.085348.59461508.yoriyuki@mbg.ocn.ne.jp> From: Benjamin Geer Subject: Re: [Ocaml-i18n] locales in Camomile and OCamlI18n Date: Sun, 23 May 2004 14:09:26 +0100 > I was going to use PXP, since it has good DTD and Unicode support, is > implemented entirely in Caml and seems very stable, and also because > I've also used it in LocalCaml, which needs a validating parser. Would > that be OK with everyone? PXP is good. I have some wishes for your library. Camomile needs collation table in its own format. Generating them from locale data is computationally heavy, so loading ldml would be a task for some conversion tool. (Or maybe using some caching mechanism with a persitent repository, but then there is problems with permission.) To facilitate such a process, it would be desirable to have a way to list all locale. Then a conversion tool could compile all locale to collation tables. Further it would be good to have a generic mechanism to handle locale dependent data, not limited to LDML. Your libaray would be a natural place for it. Then Camomile can govern all localle specific data through your library. In additon, Camomile needs ISO language and country code. Current approach is parsing locale name, but if you introduce non-standard locale name, then some method for getting ISO codes is necessary. I think LDML have them, but we have to consider performance problems of loading LDML when each time locale is used. -- Yamagata Yoriyuki From yoriyuki at mbg.ocn.ne.jp Mon May 24 08:03:18 2004 From: yoriyuki at mbg.ocn.ne.jp (Yamagata Yoriyuki) Date: Tue, 25 May 2004 00:03:18 +0900 (JST) Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <1085372201.6065.163.camel@pelican.wigram> References: <40B0A286.3070303@socialtools.net> <20040524.085348.59461508.yoriyuki@mbg.ocn.ne.jp> <1085372201.6065.163.camel@pelican.wigram> Message-ID: <20040525.000318.41632617.yoriyuki@mbg.ocn.ne.jp> From: skaller Subject: Re: [Ocaml-i18n] locales in Camomile and OCamlI18n Date: 24 May 2004 14:16:42 +1000 > This is easy to fix. Use PXP to parse the data > and convert to a specified internal Ocaml data > structure and simply Marshal it out. Loding marshaled data is still expensive, so we would want to load only one time in the process's lifetime. There would be two method to achieve this. The one way is explicitly loading a locale data as, let locale = load_locale "ja_JP" in ... let s = string_of_int ~locale 1234567 in ... But the syntax is a bit heavy. Another way is cashing them in (Weak)Hashtbl. (Camomile does this currently) > Felix does this to lex/parse files, with a version > and time stamp check to ensure the internal format > is coherent and up to date. (Make sure to put the > Ocaml compiler version in so the data is reparsed > when you upgrade Ocaml too). If we have a support for persitent data across different instances of the locale library with different uid and permissions, then it would be a way to go. Unfortunately, we do not have such a one, unless we throw away portability. -- Yamagata Yoriyuki From skaller at users.sourceforge.net Sun May 23 21:16:42 2004 From: skaller at users.sourceforge.net (skaller) Date: 24 May 2004 14:16:42 +1000 Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <20040524.085348.59461508.yoriyuki@mbg.ocn.ne.jp> References: <40B06F37.3040905@socialtools.net> <200405231358.48345.mattam@altern.org> <40B0A286.3070303@socialtools.net> <20040524.085348.59461508.yoriyuki@mbg.ocn.ne.jp> Message-ID: <1085372201.6065.163.camel@pelican.wigram> On Mon, 2004-05-24 at 09:53, Yamagata Yoriyuki wrote: > but we have to consider performance problems of > loading LDML when each time locale is used. This is easy to fix. Use PXP to parse the data and convert to a specified internal Ocaml data structure and simply Marshal it out. Felix does this to lex/parse files, with a version and time stamp check to ensure the internal format is coherent and up to date. (Make sure to put the Ocaml compiler version in so the data is reparsed when you upgrade Ocaml too). -- John Skaller, mailto:skaller at users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net From skaller at users.sourceforge.net Sun May 23 21:40:11 2004 From: skaller at users.sourceforge.net (skaller) Date: 24 May 2004 14:40:11 +1000 Subject: [Ocaml-i18n] locales in Camomile and OCamlI18n In-Reply-To: <20040524.085348.59461508.yoriyuki@mbg.ocn.ne.jp> References: <40B06F37.3040905@socialtools.net> <200405231358.48345.mattam@altern.org> <40B0A286.3070303@socialtools.net> <20040524.085348.59461508.yoriyuki@mbg.ocn.ne.jp> Message-ID: <1085373611.6065.188.camel@pelican.wigram> On Mon, 2004-05-24 at 09:53, Yamagata Yoriyuki wrote: > In additon, Camomile needs ISO language and country code. Current > approach is parsing locale name, but if you introduce non-standard > locale name, then some method for getting ISO codes is necessary. This ISO country code, language, and locale are independent and should be passed around explicitly to all core functions. Hope this is so .. it may be useful to have a 'guess' function or several (eg: look at current C locale for locale) but make sure these 'guess' tools are outside the core library, and the core is fully parametric (reentrant). -- John Skaller, mailto:skaller at users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net From rich at annexia.org Thu May 27 07:59:44 2004 From: rich at annexia.org (Richard Jones) Date: Thu, 27 May 2004 15:59:44 +0100 Subject: [Ocaml-i18n] Re: [Ocaml-lib-devel] Some (simple) functions I'd like to see in ExtLib ... In-Reply-To: <20040527145341.GH9313@redhat.com> References: <20040527131634.GA14414@redhat.com> <0c2c01c443f7$0cbdb980$ef01a8c0@warp> <20040527145341.GH9313@redhat.com> Message-ID: <20040527145944.GA17156@redhat.com> On Thu, May 27, 2004 at 03:53:41PM +0100, Richard Jones wrote: > > > ** ExtChar (or perhaps better in UChar): > > > > > > is_space, is_alnum, is_digit, is_xdigit, etc. It's inexplicable why > > > these were left out of the standard OCaml library. > > > > you're welcome to send a full featured ExtChar module. > > OK, will look at this. Do you think it should be ExtChar or UChar > though? Since so much of the code I now write uses UTF-8 exclusively > I'm loathe to contribute any more 8-bit-char-specific code to the > world ... Actually I can answer my own question here. We could define the ExtChar.is_* functions to only work correctly on 7-bit ASCII. They would return false on any character codes >= 128. This way they should do the Right Thing when presented with UTF-8 strings too. (CC-ing this to ocaml-i18n so that people who know what they're talking about can comment). Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment MOD_CAML lets you run type-safe Objective CAML programs inside the Apache webserver. http://www.merjis.com/developers/mod_caml/