[Ocaml-pxp-users] How validate a XML string with a DTD

Gerd Stolpmann info at gerd-stolpmann.de
Thu Apr 24 09:56:17 PDT 2003


Am Mit, 2003-04-23 um 19.37 schrieb amata at tsc.uc3m.es:
> 
> 
> Hi list;
> 
> I have a problem using PXP. I need to parse a XML document and 
> validate it with a DTD. In order to do that, I use the function 
> parse_document_entity. The xml document that I want validate is not a 
> file, it is a string. I do that;
> 
> let d = Pxp_yacc.parse_document_entity default_config (from_string 
> mystring) default_spec
> 
> The string to validate begins with the lines;
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE COMPETENCIA SYSTEM "./todo.dtd">
> 
> I need to validate the XML string with the DTD todo.dtd, but when I 
> execute the program, it returns; 
> 
> ERROR: No input method available for this external entity: [dtd] = 
> SYSTEM “./todo.dtd”
> 
> If I put the string into a file, and do that;
> 
> let d = Pxp_yacc.parse_document_entity default_config 
> (from_file “myfile.xml”) default_spec
> 
> all works perfectly.

What do you think SYSTEM "./todo.dtd" means? I guess you want to access
the file todo.dtd in the current directory, but this is not the meaning
of this clause. Actually, it points to the file todo.dtd relative to the
current file (like hyperlinks in HTML), and this explains what is going
wrong. If the current file has a name in the filesystem, it is possible
to resolve "./todo.dtd", but if the current file is only a string, it
does not have any name, and the parser cannot resolve the relative name
"./toto.dtd".

I hope the problem is clear now.

There are a number of ways to work around this problem. Unfortunately,
the function from_string is not able to read from files at all (even if
you would use an absolute path like "file:/path/.../todo.dtd"), but it
is possible to create a custom function that can do this using the
Pxp_reader module (see below). Furthermore, I must admit that the
current implementation (including PXP-1.1.93) is quite limited regarding
name resolution. It was simply my first trial to address these problems.
I have a better Pxp_reader implementation on my disk, but it is not yet
ready to be released.

I think the best solution is to assign todo.dtd a public name, i.e.
using PUBLIC "my-name" instead of SYSTEM:

open Pxp_reader;;
open Pxp_types;;

let pid = Private (allocate_private_id());;

let resolver =
  new combine [ new resolve_read_this_string ~id:pid s;
                lookup_public_id_as_file
                  [ "my-name", "/dir/.../todo.dtd" ]
              ] in

let source = ExtID(pid, resolver);;

Here, s is the string to parse. The value pid is the abstract name of
the string (comparable to a SYSTEM or PUBLIC name). The resolver is the
engine that maps such names to real files or strings. The source can be
used instead of the from_string call.

Now you can refer to the DTD by PUBLIC "my-name" "", and the parser maps
this name to the file assigned to this name.

This construction does not allow SYSTEM names, though. If you need them,
you can put 

new resolve_as_file()

into the list passed to "combine". But... As I mentioned, the current
implementation of Pxp_reader is only suboptimal. For example, you cannot
access files from within todo.dtd by a relative SYSTEM name, although
todo.dtd has an assigned file name. If you need such features, wait
until the next PXP version is released.

Anyway, I hope this explanation helps you already.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd at gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------




More information about the Ocaml-pxp-users mailing list