[Ocaml-pxp-users] Handling undeclared entities

Dario Teixeira darioteixeira at yahoo.com
Thu Jul 16 09:43:35 PDT 2009


Hi,

I am using PXP to parse a small HTML-like markup.  I would like to allow
the use of common HTML entities in the source text (such as €), but I
don't want to include a list of *all* of them in the DTD  (note that these
are eventually checked for validity somewhere else; I just don't need this
task to be performed also by PXP).

Now, the PXP manual mentions several times that entities are automatically
converted into regular #PCDATA, and there doesn't seem to be a way of passing
them unmodified to the processing code.  Therefore, if they are not declared
in the DTD I get a parsing error.

One solution I can think of is to preprocess the source file, using regexps
to replace entity references by a special node.  Something like this:
"the symbol is &euro;" -> "the symbol is <entity>euro<entity>".

This solution is of course way to kludgy and error prone.  Is there a better
alternative within PXP?

Thanks!
Best regards,
Dario Teixeira



      


More information about the Ocaml-pxp-users mailing list