[Ocaml-pxp-users] xml with any charset encoding

Gerd Stolpmann info at gerd-stolpmann.de
Thu Aug 14 06:36:05 PDT 2003


Am Mit, 2003-08-13 um 21.09 schrieb Anastasia Gornostaeva:
> Hello.
> 
> I want to parse RSS from websites. These RSS files can be any charset encoding
> (not only ascii or latin letters). 
> I want to put them into pxp and receive UTF-8 data at output.
> How do it quickly and easily?

Simply select UTF-8 as internal encoding, e.g.

let config = { default_config with encoding = `Enc_utf8 }

Then pass this config value to the parsing function. The effect is that
PXP can represent all characters that are assigned in Unicode.

If the RSS files use different encodings, these are automatically
converted to UTF-8. Actually, this step is performed by the netstring
library, and all character encodings supported by netstring can be
processed. I recommend that you use the newest version of netstring
(included in ocamlnet-0.96), because the conversion algorithm runs much
faster than in previous versions. Furthermore, it is more difficult to
misconfigure netstring.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd at gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------



More information about the Ocaml-pxp-users mailing list