[Ocaml-pxp-users] Re: Question about interaction style with complex documents

Gerd Stolpmann info at gerd-stolpmann.de
Thu Jun 2 04:10:53 PDT 2005


Am Dienstag, den 31.05.2005, 14:00 +0100 schrieb Richard Jones:
> [I'm still trying to parse WSDL files ...]
> 
> Part of the problem is trying to do pattern matching across multiple
> levels of Pxp document.  I end up with a lot of intractable code like
> the code attached below (and that's "simplified" with lots of helper
> functions).
> 
> There are two related problems here.  Firstly the object access style
> means I can't really use OCaml pattern matching, but instead have to
> do:
> 
>   match node#node_type with
>     | Pxp_document.T_element name when name = ... ->
> 
> and the second is that because I'm using namespaces I have to grab the
> normprefix for the document and match on:
> 
>   ... when name = xsd ^ ":nodename" ->
> 
> It would all be a lot simpler if there was a way to (a) convert the
> tree to a more convenient OCaml structure which would allow pattern
> matching, and (b) normalise the namespace prefixes so that they are
> what I expect (eg. "xsd:nodename").

PXP can do namespace normalization, so (b) should be no problem. This is
quite simple to arrange. Provided mng is the namespace manager, just do

mng # add_namespace "xsd" "http://whatever/";
mng # add_uri "xsd" "http://an-alias-for-whatever"

This forces that all prefixes for http://whatever/ and
http://an-alias-for-whatever are normalized to xsd. So you can easily
match with 

match node#node_type with
    | Pxp_document.T_element "xsd:nodename" -> ...

> I guess I can write code easily enough to do (a).  I'm not sure
> whether XML allows (b) to be possible.

In principle, (b) is possible by XML definition. However, you will run
into problems when the prefixes occur within XML data (e.g. in XML
Schema definitions this is the case), because these prefixes are not
rewritten. PXP allows the user to do the normalization of such prefixes
later, but you must do this yourself.

Regarding (a), the easiest approach is to use the event parser, and to
build your own tree from the events. In the sources of Pxp_document you
can find the function [solidify] that shows how to create trees from
events (although this function creates Pxp_document trees, it shows the
principle).

Maybe you should also consider to use a stream parser. You can find an
example in examples/pullparser/pull.ml. Stream parsers are especially
useful when the syntax of the XML document is non-trivial.

> So the question is how have people solved these problems?  They must
> surely be common with any serious XML work ...

Well, the majority of XML programmers are fighting with DOM, and DOM
does not know prefix normalization, and they are programming in
languages that cannot do matching. So this often enough quite painful.

Gerd

> 
> Rich.
> 
> (I guess the other possibility is something like CDuce).
> 
> ----------------------------------------------------------------------
>       match node#node_type with
> 	| Pxp_document.T_element name when name = xsd ^ ":simpleType" ->
> 	    let restriction =
> 	      let subnodes = filter (xsd ^ ":restriction") node#sub_nodes in
> 	      match subnodes with
> 		| [h] -> h
> 		| _ ->
> 		    failwith "Xsd.parse: expecting a single <restriction> subnode" in
> 	    let base = get_attr "base" restriction in
> 	    let bt =
> 	      try parse_basic_type base
> 	      with Not_found ->
> 		failwith ("Xsd.parse: expecting a basic type, but got " ^
> 			  base) in
> 
> 	    (* If we have <enumeration> subnodes, then really it's an
> 	     * enumeration.
> 	     *)
> 	    let enums = filter (xsd ^ ":enumeration") restriction#sub_nodes in
> 	    let enums = List.map (get_attr "value") enums in
> 
> 	    SimpleType (bt, enums)
> 
> 	| Pxp_document.T_element name when name = xsd ^ ":complexType" ->
> 	    (match node#sub_nodes with
> 	       | [] -> ComplexType Unix
> 	       | [n] when
> 		   n#node_type = Pxp_document.T_element (xsd ^ ":sequence") ->
> 		   ComplexType (Sequence (parse_sequence n))
> 	       | [n] when
> 		   n#node_type = Pxp_document.T_element (xsd ^ ":all") ->
> 		   ComplexType (All (parse_all n))
> 	       | _ ->
> 		   failwith "Xsd.parse: multiple subnodes of <complexType>")
> 
> 
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd at gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------





More information about the Ocaml-pxp-users mailing list