[Ocaml-pxp-users] Ignore listed DTD, Validate Against Different Local One

Gerd Stolpmann info at gerd-stolpmann.de
Wed Jan 2 09:01:35 PST 2008


Am Samstag, den 22.12.2007, 19:05 -0600 schrieb Jeff Wheeler:
> Hey,
> 
> I'm new to this list, and for the most part have only played with OCaml
> a week or so, but I'm having some trouble with Pxp in OCaml.
> 
> I'm trying to validate a document with a PUBLIC DTD (to be specific, the
> menu XML representation in ~/.config/menus/gnome-applications.menu),
> which starts with this:
> 
> <!DOCTYPE Menu
>   PUBLIC '-//freedesktop//DTD Menu 1.0//EN'
>   'http://standards.freedesktop.org/menu-spec/menu-1.0.dtd'>
> 
> It makes no sense to try and download this doctype every time I load up
> the script (and Pxp seems to make that awkward, requiring a custom
> resolver), so I'd like to save that document locally and validate
> against that SYSTEM DTD instead.

Well, PXP treats these PUBLIC or SYSTEM strings basically as
identifiers. Unless configured in a special way, the only activated
method is the one that deals with "file" SYSTEM ids. By using custom
resolvers, you can interpret these strings as you want, and in the
module Pxp_reader you find a lot of useful functions that make this
easy. For example,

let c =
  Pxp_reader.lookup_public_id_as_file
    [ "-//freedesktop//DTD Menu 1.0//EN",
      "/usr/share/...whereever.dtd"
    ]

defines a very special resolver that maps this PUBLIC ID to this file
name (and allows nothing else).

All the Pxp_types.from_* functions have an ~alt argument, and you can
make such special resolvers available by passing ~alt, e.g.

let source =
  Pxp_types.from_file ~alt:[c] "/my/file.xml"

The effect is that the standard way of resolving IDs defined by
from_file is extended by the capabilities of c.

> So far, I haven't found any way to make the Pxp parser even get past the
> doctype in there without choking.
> 
> Anyways, I'd like to replace the doctype Pxp uses to validate the tree
> with before it starts parsing, and not even look at the PUBLIC one that
> it currently chokes on. Any hints for starting?

There is no way for a resolver to know to which part of the document the
id refers (i.e. whether to the whole document, the DTD, or some included
entity). The resolver only sees the id, and has to get the file if it
knows how to do that.

What comes close to your idea:
      * Use from_channel or from_obj_channel to read the document to
        parse. The whole document is assigned a private id in this case,
        and not a PUBLIC or SYSTEM id.
      * Define a resolver that maps all PUBLIC ids to the DTD you want
        to have (see below).
      * Leave support for SYSTEM ids out.

There is no good support for mapping all PUBLIC ids to a single DTD.
What you can do is this:

let c =
  new Pxp_reader.resolve_to_any_obj_channel
    ~channel_of_id:(fun rid ->
                    match rid with
                    | Public _ ->
		        let ch = new Netchannels.input_channel
                                   (open_in "/your/dtd") in
                        (ch,None,None)
                    | _ ->
                        raise Not_competent
                    )
    ()

This should work (but I haven't tested it). Then pass c in ~alt as
explained.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd at gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------





More information about the Ocaml-pxp-users mailing list