[Ocaml-pxp-users] How do I resolve PUBLIC external entities from inside DTD files?

Gerd Stolpmann gerd at gerd-stolpmann.de
Mon May 18 06:54:15 PDT 2009


Am Freitag, den 15.05.2009, 14:38 +1200 schrieb Glyn Webster:
> On Fri, 15 May 2009 2:14:44 am you wrote:
> > This is a bad interaction of the file resolver (inside
> > Pxp_types.from_file), and the catalog resolver. It tries to do this:
> > Because HTMLlat1 also has a file name attached to the PUBLIC name, the
> > file resolver tries to open the entity by file name. However, the
> > information is lost relative to which directory the file is to be
> > opened, because it is an "inner" PUBLIC entity.
> 
> Thank you. Your "Pxp_reader.combine" solution worked for me. (Editing the DTD 
> worked too, but I want to avoid that.) 
> 
> Was the way I was trying to do things originally sensible? Or is what you have 
> shown my here how I should have done things from the start?
> 
> I think I understand the first part of what was happening (file resolution is 
> applicable to PUBLIC ids that provide filenames, the "~alt" resolver is only 
> used if the default resolver is inapplicable), but I still don't understand 
> why the directory information gets lost, though. 

Well, the name resolution is a complex thing. It is easy to make errors
- both for the developer and the user. Also, there is no concise
definition from W3C how things ought to work.

I'd say what you tried originally is reasonable, and should work.
Because of this, I've changed the PXP code.

For resolving SYSTEM identifiers one has to track the base URI, i.e. the
directory in "URL speak". For example, if you have:

http://somewhere/file1.xml:
  <!ENTITY file2 SYSTEM "dir/file2.xml">

http://somewhere/dir/file2.xml:
  <!ENTITY file3 SYSTEM "file3.xmL">

Then file3 is located in dir (like in a HTML hyperlink). If there is one
PUBLIC name in between, however, there is no base URI then. This is what
I mean with "directory information gets lost".

In XML things are even harder, because you can have both a SYSTEM and a
PUBLIC name. However, when you look up an entity, you pick only either
of them. So it is easy to get errors like yours - on one level, both
identifiers would work, and the parser picks the unexpected one, and in
the next name resolution you get a failure.

Generally, I'd recommend to avoid PUBLIC identifiers, and use only
SYSTEM ones. You can easily map http:// URI's to real files with the
class rewrite_system_id. That way, each document has a base URI, and you
never run into this kind of problem.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd at gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------





More information about the Ocaml-pxp-users mailing list