[Ocaml-pxp-users] How do I resolve PUBLIC external entities from inside DTD files?

Gerd Stolpmann gerd at gerd-stolpmann.de
Thu May 14 07:14:44 PDT 2009


This is a bad interaction of the file resolver (inside
Pxp_types.from_file), and the catalog resolver. It tries to do this:
Because HTMLlat1 also has a file name attached to the PUBLIC name, the
file resolver tries to open the entity by file name. However, the
information is lost relative to which directory the file is to be
opened, because it is an "inner" PUBLIC entity.

If it is an option for you, you can remove this file name attachment
from PUBLIC, as in

<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "" >

(note the empty string). 

Otherwise, I'd recommend to revert the resolution order: First try the
catalog, then the file resolution, something like

let file_resolver =
  new Pxp_reader.resolve_as_file ... in
let resolver =
  new Pxp_reader.combine [ catalog; file_resolver ] 
let file_url = 
  Pxp_reader.make_file_url "test.html"
let source = 
  ExtID(System((Neturl.string_of_url file_url), resolver)

(untested, however).

The SVN version of PXP also contains a tentative fix: It is avoided to
run into the problematic case, because the file resolver is no longer
used when the directory information is lost.

BTW, it must read

let catalog =
  new Pxp_reader.lookup_public_id_as_file
    [ ("-//W3C//DTD XHTML 1.0 Transitional",		"xhtml1-transitional.dtd");
      ("-//W3C//ENTITIES Latin 1 for XHTML//EN",   	"xhtml-lat1.ent");
      ("-//W3C//ENTITIES Symbols for XHTML//EN", 	"xhtml-symbol.ent");
      ("-//W3C//ENTITIES Special for XHTML//EN", 	"xhtml-special.ent") ]
;;


Hope this helps,

Gerd


Am Donnerstag, den 14.05.2009, 19:28 +1200 schrieb Glyn Webster:
> I'm trying to work out how to use PXP, so I've written a toy program to read XHTML 
> files. It resolves PUBLIC id's to files in the current directory, where I've placed 
> W3C's XHTML DTD files. It can open the DTD file from inside an XHTML file, but it 
> does not open PUBLIC entity files from within the DTD. I've attached the program 
> and the error that comes up. Please, could anyone tell what I'm doing wrong?
> 
>  ************************************************************
> glyn at ela:~/Ocamldtd/Test$ ls
> pxptest.ml  xhtml1-transitional-2.dtd  xhtml-lat1.ent     xhtml-symbol.ent
> test.html   xhtml1-transitional.dtd    xhtml-special.ent
> 
> glyn at ela:~/Ocamldtd/Test$ ocamlfind ocamlc -package pxp -linkpkg pxptest.ml && ./a.out
> In entity [toplevel] = SYSTEM "file://localhost/home/glyn/Ocamldtd/Test/test.html", at line 4, position 64:
> In entity [dtd] = PUBLIC "-//W3C//DTD XHTML 1.0 Transitional" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd", at line 29, position 
> 0:
> ERROR: Unable to open the external entity: HTMLlat1 = PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"
> 
>  ************************************************************
>  pxptest.ml
> 
> let catalog =
>   new Pxp_reader.lookup_public_id_as_file
>     [ ("-//W3C//DTD XHTML 1.0 Transitional",		"xhtml1-transitional.dtd");
>       ("-//W3C//ENTITIES Latin 1 for XHTML//EN",   	"xhtml-lat1.ent");
>       ("-//W3C//ENTITIES Symbols 1 for XHTML//EN", 	"xhtml-symbol.ent");
>       ("-//W3C//ENTITIES Special 1 for XHTML//EN", 	"xhtml-special.ent") ]
> ;;
> 
> try
>   let document = 
>     Pxp_tree_parser.parse_document_entity 
>       Pxp_types.default_config
>       (Pxp_types.from_file ~alt:[catalog] "test.html")
>       Pxp_tree_parser.default_spec
>   in
>   document # root # write (`Out_channel stdout) `Enc_utf8
> with exn ->
>   print_endline (Pxp_types.string_of_exn exn)
> ;;
> 
>  ************************************************************
>  test.html
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE html PUBLIC
>        "-//W3C//DTD XHTML 1.0 Transitional"
>        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
>     <head>
>     ...etc...
> 
> --Glyn
> _______________________________________________
> Ocaml-pxp-users mailing list
> Ocaml-pxp-users at orcaware.com
> http://www.orcaware.com/mailman/listinfo/ocaml-pxp-users
> 
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd at gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------





More information about the Ocaml-pxp-users mailing list