[Svnmerge] correct parsing of xml output

Mattias Engdegård mattias at virtutech.se
Mon Jan 23 06:16:43 PST 2006


get_copyfrom() does not parse the output of xml log --xml correctly.
Regexes are greedy, so .* leads to mismatches (and incorrect behaviour).
We were just bitten by this bug.

--- svnmerge.py	(revision 18197)
+++ svnmerge.py	(working copy)
@@ -479,7 +479,7 @@
     out = launchsvn('log -v --xml --stop-on-copy "%s"' % dir, split_lines=False)
     out = out.replace("\n", " ")
     try:
-        m = re.search(r'(<path .*action="A".*>%s</path>)' % rlpath, out)
+        m = re.search(r'<path ([^>]*action="A"[^>]*)>%s</path>' % rlpath, out)
         head = re.search(r'copyfrom-path="([^"]*)"', m.group(1)).group(1)
         rev = re.search(r'copyfrom-rev="([^"]*)"', m.group(1)).group(1)
         return head,rev

This improves it to the point of actually working in practice, but since > 
is legal inside attribute values it is not completely correct.
Doing this correctly would make the regexp more complicated; the two
instances of [^>]* would have to be replaced by something like

      ([^>'"]*("[^"]*"|'[^']*'))*

where ' needs to be escaped somehow (I suggest using a triple-quoted
instead of "raw" string). Add this if you care enough.
Proper XML parsing would be best, but I suppose you have some reason for
not doing this (compatibility with old Python version perhaps).

Please apply this patch or the suggested variant.




More information about the Svnmerge mailing list