[Ocaml-pxp-users] PXP: Saving memory

Till Varoquaux till.varoquaux at gmail.com
Tue Apr 18 04:34:43 PDT 2006


I guess this is the kind of cases where you would want to avoid loading the
whole document in the memory. Hopefully processing can be stream based in
which case other XML parsers would seem more appropriate (e.g. expat).
If your document struture is very simple and straightforward I guess you
could even a lexer instead of a XML parser, however this could be a cause of
problems since the same document can take different syntax and style contain
the same meaningfull information (e.g. use entities, swith the encoding in
the prefix...).
This was a very simple and staightforward answer so I guess you were already
aware of all this in which case I'd apologize for the useless answer...
Regards,
Till
On 4/18/06, Richard Jones <rich at annexia.org> wrote:
>
>
> We have a problematic XML document (a daily report) which runs to
> something like 600-700 MB in size.  It is proving hard to parse this
> document with PXP, because our machine runs out of memory and starts
> thrashing.  In future this document will only grow in size.
>
> We're looking for options to reduce the amount of memory used.  One
> option would be to somehow parse the document incrementally, but I
> don't think this is possible with PXP.
>
> Another option would be to use "pools".  However documentation is very
> thin on how to use these.  It seems that the parsing function we are
> using, parse_wfdocument_entity, doesn't allow pools to be passed, and
> that's assuming we even knew how to create pools in the first place,
> which isn't very obvious.
>
> The document isn't very complicated - it's just a simple list of
> <row>'s.
>
> Can someone give me suggestions?
>
> Rich.
>
> PS. One thing we found when parsing this, is that #find_all_elements
> isn't tail recursive, meaning that it causes a crash on even fairly
> modest documents.
>
> --
> Richard Jones, CTO Merjis Ltd.
> Merjis - web marketing and technology - http://merjis.com
> Team Notepad - intranets and extranets for business -
> http://team-notepad.com
> _______________________________________________
> Ocaml-pxp-users mailing list
> Ocaml-pxp-users at orcaware.com
> http://www.orcaware.com/mailman/listinfo/ocaml-pxp-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /pipermail/ocaml-pxp-users/attachments/20060418/7d21e309/attachment.htm 


More information about the Ocaml-pxp-users mailing list