[Svnmerge] [PATCH] performance fix for some platforms

Tue Jul 3 08:58:00 PDT 2007

Better late than never.  Part of the test failures (missing "--force" in
tests) were caused by r23052. But r22788, a good idea that was just
incomplete, is more interesting.  See if my thinking makes sense or not.


The problem is that r22788 is initializing to the wrong revision number.  If
we copy, say, branch from trunk (becoming, respectively, the merge source and
merge target--I prefer the names source and target in the code, since the
merges could go in any direction), then there are 2 revision numbers of
interest:
   1) the particular revision number of the trunk (merge target) from which
   the branch (merge source) was copied
   2) and the revion number in which the copy was committed to create the
   branch (merge source)
#1 above is the one that r22788 uses. This causes problems I'll demonstrate
below. #2 is the one that we want to use for initializing a branch in *this
scenario*.  We want to use #2 because the changes that occur between revisions
#1 and #2 are not of interest (we are trying to merge in changes that occurred
ON the branch, which didn't exist before #2).

Note however, that this situation reverses if the trunk becomes the merge
source, and the branch becomes the merge target (i.e., one is merging into a
branch from trunk, to stay synchronized with the changes on trunk).  In that
case we do want to initialize to revision #1 above.  I'll try to handle this
in a subsequent patch.

(Another scenario is the one that svnmerge already handled automatically: when
neither the merge source nor target is a copy of the other. In this case it
inits svnmerge on the target to the latest revision of the merge source,
saying in effect, "everything has been merged, so start tracking from right
now).

r22788 is the cause of a discussion we had earlier, though we didn't realize
it then:

>> +        """todo:
>> +        Why does init require parameters in this case???try it without the 
>> +        "?r 1?13" parameter to see the errors. Is this a 
>> +        bug that ppl will run into a lot & should be fixed somehow, maybe by 
>> +        excluding that revision range, in init, automatically? (why 
>> +        did init, w/o parameters, only init to 1?6, when it wasn't created 
>> +        until 12?)  Similarly, if you just do "svnmerge init" then 
>> +        "svnmerge block 12", it fails because it gets the logs to only process
>> +        revision #'s that actually show up in the logs as changed for that
>> +        particular file (to omit things changed on other urls, I think), and 
>> +        12 isn't one of those.
>> +        """
>
>I ran into this problem too ?? it would be nice to fix, but that's
>another patch :)

To reproduce a simplified example of the problem, one can insert this into
svnmerge_test.py, inside the class TestCase_TestRepo(TestCase_SvnMerge), and
run  "svnmerge_test.py TestCase_TestRepo.testMergeFileRenameOntoModifiedFile".
If you trade the commented line for the one after it, then it gets no error.
    def testScenarioCausedByr22788(self):
        os.chdir(self.test_path)
        os.chdir("trunk")
        
        #self.svnmerge2(["init", self.test_path + "/test-branch", "-r 1-13"])
        self.svnmerge2(["init", self.test_path + "/test-branch"])
        
        self.multilaunch("""
            svn up
            svn ci -m 'initialize svnmerge'
        """)
        os.chdir("../trunk")
        self.svnmerge("merge")

r22788 also broke compliance with this statement in "svnmerge help init":
    "If SOURCE is specified, all the revisions in SOURCE are marked as already
    merged"
This was no longer true, in the case where the source was copied from the
target.  

This gets the error:
    LaunchError: (1, 'svn merge --force -r 12:13
    file:///tmp/__svnmerge_test/repo/branches/test-branch .', "svn: Unable to
    find repository location for
    'file:///tmp/__svnmerge_test/repo/branches/test-branch' in revision 12\n")
..because test-branch didn't exist in r12, but svnmerge init thinks it did. 
It shows that r22788 messes up default behavior even in a simple case.

So I provided additional logic to complete the helpful thinking behind r22788,
and renamed a few variables for clarity and readability (like clearing up the
source/target role issue a little).

It passes the tests, at least on gnu/linux (suse 10.0) and windows xp.  I
think the existing tests as modified are adequate for this change; they aren't
designed specifically to communicate it, as would be ideal, but they do
exercise the code and fail when needed.


Is this version comment ok?  Why would we the modified filenames, as shown in
the hacker's guide example, since they are also in the "log -v" output
already? Is it a place to comment each modified function? In this case maybe
it makes more sense to just put in the one comment line, and no filenames, but
I'm wondering what I'm missing.

[[[
Fix bug & related test failures caused by r22788.  Improve default handling of
"init" command as r22788 intended, for scenario where merge source is a copy
of the merge target.

* contrib/client-side/svnmerge/svnmerge.py
* contrib/client-side/svnmerge/svnmerge_test.py
]]]




(ps--can I be an svnmerge maintainer, as Daniel Rall suggested?)
















Thanks to both for the reviews; I will try to reply to both
though I'm quoting only one.

>>> On Mon, Jul 2, 2007 at  6:41 am Giovanni Bajo <rasky at develer.com> wrote: 
> 1) You can't import subprocess at the top of the file without guarding it
> with try/except. Please, do the import immediately above the launch()
> function and totally replace its definition in case the module is
> available. Something like:
> 
> try:
>    import subprocess
>    def launch(...):
>        # subprocess version
> 
> except ImportError:
>    def launch(...):
>        # old version

Done.  Yes, this is much better. It still lacks a test with an older
python, as would be ideal, but I did run all tests w/
python 2.5.1 on windows and 2.4 on suse.
 
> 2) I dislike your manual argument quoting code. The splitting part
should
> be done with shlex.split() (but that follows POSIX, not Windows CMD),
and
> the merging with subprocess.list2cmdline (under Windows). Anyway, I
don't
> get why you need to do *anything* at all, since Popen() accepts both a
> single string or a list of strings.

I don't like it either but without it, many tests fail on linux.  Your
suggestion is a good one I'd like to implement ... but I think instead
I should work on things like getting the rest of my patches in, to
address parts of the "rename + merge = dataloss" problem, testing out
more scenarios to see what else could go wrong with merges and think of
how to address them.  Even better yet, I'd like to get up to speed and
work on a couple of things in svn itself to help address "tree
collisions" and to avoid the problem of silent dataloss with
merge+rename, altogether.  As time permits.

> 3) As a followup patch, please do change all launch's users to get a list
> of strings.

I hope you'll forgive me, but ...same as prev comment on priorities.  I
think anything we can do to get by for now and solve the core problem in
svn in some way is time better spent.  But maybe I'm missing something.
 
> 4) Explicitally specify close_fds=False in the Popen() call, since that's
> the "raison d'etre" of the whole patch.

Done.

> 5) p.returncode==0 is a sufficient test for success. Why do you also check
> that stderr output is empty? I find that irrelevant.

That seemed cheaper than the research to guarantee that all called
programs consistently return a proper status code versus stderr writes.
But I don't feel strongly, and have changed it as you suggested.

> 6) Why do you need to specify a different "executable=" argument for the
> two operating systems? Isn't the default OK? Also, why do you think that we
> need to go through the shell under nt? I thought that wasn't required.

It simply didn't work on windows until I did.  Various errors.  (Actually, python has been inconsistent for me across platforms in that regard as well as in regular expressions.)

Thanks for the review and suggestions!  I ran the tests on Windows 
also, but unfortunately not with an older python version.

(Any thoughts on my other patch, for fixing the older regression test
errors?  I'd like to put these to rest if possible so I can start
sending in others...)


-Luke
-------------- next part --------------

--- contrib/client-side/svnmerge/svnmerge.py-be4PerfFix	2007-06-29 15:20:30.000000000 -0600
+++ contrib/client-side/svnmerge/svnmerge.py	2007-07-03 09:47:57.000000000 -0600
@@ -201,35 +201,99 @@
     exit code of the process, the original command line, and the output of the
     command."""
 
-def launch(cmd, split_lines=True):
+try:
     """Launch a sub-process. Return its output (both stdout and stderr),
     optionally split by lines (if split_lines is True). Raise a LaunchError
     exception if the exit code of the process is non-zero (failure)."""
-    if os.name not in ['nt', 'os2']:
-        p = popen2.Popen4(cmd)
-        p.tochild.close()
-        if split_lines:
-            out = p.fromchild.readlines()
-        else:
-            out = p.fromchild.read()
-        ret = p.wait()
-        if ret == 0:
-            ret = None
+    import subprocess
+    def launch(cmd, split_lines=True):
+        # Requiring python 2.4 or higher, on some platforms we get 
+        # much faster performance from the subprocess module (python
+        # doesn't try to close an exhorbitant number of file descriptors)
+        
+        # todo: The next two little sections, handling argument quoting, could
+        # probably go away if we change all callers of this method to pass in
+        # the parameter "cmd" as a Python list of arguments instead of as a
+        # string.
+        ''' Need to keep quoted arguments together as single parameters, so can't 
+        use plain cmd.split(). Is there a better way than this?'''
+        args = re.split("(\"[^\"]*\")|(\'[^\']*\')|\s",cmd)
+        while None in args:
+            args.remove(None)
+        while '' in args:
+            args.remove('')
+        
+        ''' (We need to remove double quoting from args that are quoted twice--it 
+        didn't matter when we used the older subprocess handler (popen2) but now 
+        parameters that were previously quoted (probably because they might 
+        contain spaces or odd chars), get re-quoted when we convert args to 
+        strings via the above call to "re.split()", thus we have double quotes 
+        that have to be removed or it breaks.)
+        '''
+        counter = 0
+        for arg in args:
+            if (arg[0]=='"' or arg[0]=="'") and (arg[-1]=='"' or arg[-1]=="'"):
+                args[counter] = arg[1:-1]
+            counter = counter + 1
+        
+        
+        stdout = ""
+        stderr = ""
+        try:
+            if os.name == 'nt':
+                p = subprocess.Popen(args, executable=os.environ['COMSPEC'], \
+                                     stdout=subprocess.PIPE, close_fds=False,\
+                                     stderr=subprocess.PIPE, shell=True)
+            else:
+                p = subprocess.Popen(args, executable=args[0], \
+                                     stdout=subprocess.PIPE, close_fds=False,\
+                                     stderr=subprocess.PIPE)
+            stdoutAndErr = p.communicate()
+            stdout = stdoutAndErr[0]
+            stderr = stdoutAndErr[1]
+        except OSError, inst:
+            # Using 1 as failure code; should get actual number somehow? For 
+            # examples see svnmerge_test.py's TestCase_launch.test_failure and 
+            # TestCase_launch.test_failurecode.
+            raise LaunchError(1, cmd, stdout + " " + stderr + ": " + str(inst)) 
+        
+        if p.returncode == 0:
+            if split_lines:
+                # Setting keepends=True for compatibility with previous logic 
+                # (where file.readlines() preserves newlines)
+                return stdout.splitlines(True)
+            else:
+                return stdout
         else:
-            ret >>= 8
-    else:
-        i,k = os.popen4(cmd)
-        i.close()
-        if split_lines:
-            out = k.readlines()
+            raise LaunchError(p.returncode, cmd, stdout + stderr)
+except ImportError:
+    # support versions of python before 2.4 (slower on some systems)
+    def launch(cmd, split_lines=True):
+        if os.name not in ['nt', 'os2']:
+            p = popen2.Popen4(cmd)
+            p.tochild.close()
+            if split_lines:
+                out = p.fromchild.readlines()
+            else:
+                out = p.fromchild.read()
+            ret = p.wait()
+            if ret == 0:
+                ret = None
+            else:
+                ret >>= 8
         else:
-            out = k.read()
-        ret = k.close()
-
-    if ret is None:
-        return out
-    raise LaunchError(ret, cmd, out)
-
+            i,k = os.popen4(cmd)
+            i.close()
+            if split_lines:
+                out = k.readlines()
+            else:
+                out = k.read()
+            ret = k.close()
+        
+        if ret is None:
+            return out
+        raise LaunchError(ret, cmd, out)
+    
 def launchsvn(s, show=False, pretend=False, **kwargs):
     """Launch SVN and grab its output."""
     username = opts.get("username", None)
-------------- next part --------------
--- contrib/client-side/svnmerge/svnmerge_test.py-be4PerfFix	2007-06-30 14:47:10.000000000 -0600
+++ contrib/client-side/svnmerge/svnmerge_test.py	2007-06-30 14:48:15.000000000 -0600
@@ -456,7 +456,8 @@
     def multilaunch(self, cmds):
         for cmd in cmds.split("\n"):
             cmd = cmd.strip()
-            svnmerge.launch(cmd % self.command_dict())
+            if len(cmd) > 0:
+                svnmerge.launch(cmd % self.command_dict())
 
     def revert(self):
         self.multilaunch("svn revert -R .")