[Svnmerge] [PATCH] fix for rev printing bug in svn status

Dustin J. Mitchell dustin at zmanda.com
Wed Jun 20 22:59:48 PDT 2007


One other thing the devs asked me to do was to divide my patch into a
series of smaller patches.  That might be useful in this case -- for
example, it looks like some of your changes to svnmerge_test.py are
cleanup on the test infrastructure, rather than new tests for your
functionality.  

Similarly, your conversion to the subprocess module probably deserves
its own patch, although that could face an uphill battle -- lots of
folks still use Python 2.3, since lots of RedHat systems are still out
there with 2.3.4.  Presenting it as a separate pach might allow us to
develop a wrapper that can fall back to popen..

Also, this is a pretty significant divergence from the usual operation
of svnmerge, in the sense that it gets down to creating diffs and
applying them.  I got most of the way through the patch, but I don't see
the "big picture", so I can't really comment on the overall structure.
What do you think about writing up a textual description of the changes
for the list?

Here's a review based on reading the patch; I haven't applied and
experimented with it.

On Tue, Jun 19, 2007 at 04:14:18PM -0600, lsuvkne at onemodel.org wrote:
> +        """todo:
> +        Why does init require parameters in this case?--try it without the 
> +        "-r 1-13" parameter to see the errors. Is this a 
> +        bug that ppl will run into a lot & should be fixed somehow, maybe by 
> +        excluding that revision range, in init, automatically? (why 
> +        did init, w/o parameters, only init to 1-6, when it wasn't created 
> +        until 12?)  Similarly, if you just do "svnmerge init" then 
> +        "svnmerge block 12", it fails because it gets the logs to only process
> +        revision #'s that actually show up in the logs as changed for that
> +        particular file (to omit things changed on other urls, I think), and 
> +        12 isn't one of those.
> +        """

I ran into this problem too -- it would be nice to fix, but that's
another patch :)

> +        self.verify_strings_in_file("svnmerge-commit-message.txt", \
> +                                    ["Comments from pre-existing copy", \
> +                                    "In r15 renamed", "r14", "modify test1"])

It's a little visually unclear what the svnmerge commit message should
look like -- perhaps a loose regex over the whole message would work
better?  Some extra comments here might help, too, giving context.

> +    def testMergeDirectoryRenameOnlyOntoModifiedFile(self):
...
> +        self.multilaunch("""
> +            svn add 'dir1/dir2/test6'
> +            svn ci -m 'adding directories and file test6'
> +            svn cp dir1 ../test-branch
> +        """)

There are 13 commits in the base repository, so this is r14

> +        # Initialize svnmerge
> +        #todo: (see cmts at same plc in testMergeFileRenameOntoModifiedFile())
> +        self.svnmerge2(["init", self.test_path + "/test-branch", "-r 1-16"])

Yet you're initializing through r16?

> +        # edit a file in merge target so it will trigger a patch
> +        addedText = 'firstmod'
> +        editfile = open("dir1/dir2/test6", "a")
> +        try:
> +            editfile.write("\n" + addedText)
> +        finally:
> +            editfile.close()
> +        r15Comment = "modify test6 file in merge target"
> +        self.launch("svn up")
> +        self.launch("svn ci -m '"+r15Comment+"'")  # creates r15
> +
> +        # set up initial state on test-branch
> +        os.chdir("../test-branch")
> +        self.launch("svn ci -m 'creating dir1, dir2, and file test6'")  # created r16

This is checking in the copy of dir1 made earlier - does making that
copy early (rather than doing 
  self.multilaunch("""
    svn cp ../trunk/dir1
    svn ci -m 'creating dir1, dir2, and file test7'
  """)
here) have some significance?  I find it a bit confusing to initialize
merge tracking *before* you commit the relevant changes..

> +    def testMergeDirectoryAndFileRenamesOntoModifiedFile(self):

Same things apply here.

> +def log_to_file(msg, filename=None):

I don't see log_to_file() used anywhere..

> +    ''' Need to keep quoted arguments together as single parameters, so can't 
> +    use plain cmd.split(). Is there a better way than this?'''
> +    args = re.split("(\"[^\"]*\")|(\'[^\']*\')|\s",cmd)
> +    while None in args:
> +        args.remove(None)
> +    while '' in args:
> +        args.remove('')
> +    
> +    ''' (We need to remove double quoting from args that are quoted twice--it 
> +    didn't matter when we used the older subprocess handler (popen2) but now 
> +    parameters that were previously quoted (probably because they might 
> +    contain spaces or odd chars), get re-quoted when we convert args to 
> +    strings via the above call to "cmd.split()", thus we have double quotes 
> +    that have to be removed or it breaks.)
> +    '''
> +    counter = 0
> +    for arg in args:
> +        if (arg[0]=='"' or arg[0]=="'") and (arg[-1]=='"' or arg[-1]=="'"):
> +            args[counter] = arg[1:-1]
> +        counter = counter + 1

What you're trying to do here is basically to re-implement the shell's
parser.  I think that the best options would be either to run the
equivalent of "sh -c '%(args)s'" or, preferably, convert all of the
calls to this function to specify a Python list of arguments.

>  def target_to_url(dir):
>      """Convert working copy path or repos URL to a repos URL."""
> -    if is_wc(dir):
> +    # allow dir to be either a directory, a file in a directory, or a file in 
> +    # the current directory, respectively:

This is a change -- typically a single file cannot be considered a
working copy.  What's the justficiation for this?  There should be a
unit test for the new functionality, too.

>      # Try using "svn info URL". This works only on SVN clients >= 1.2
> -    try:
> -        info = get_svninfo(url)
> -        return info["Repository Root"]
> -    except LaunchError:
> -        pass
> +    #try:
> +    info = get_svninfo(url)
> +    return info["Repository Root"]
> +    #except LaunchError:  
> +        #pass
> +    '''todo: I had a bug which was obscured by this original 'pass' line.  
> +    Would it be better to raise the error?  The bug was caused when passing 
> +    'file:///tmp/__svnmerge_test/repo/branches/test-branch/dir1/dir2' 
> +    into target_to_repos_relative_path(), which works if we append "@16" 
> +    (ultimately this data came from the function 
> +    testMergeDirectoryAndFileRenamesOntoModifiedFile() in 
> +    svnmerge_test.py) when it does 'self.svnmerge("merge -r 17")'), but 
> +    it took a while to find the bug and its fix, because this routine 
> +    happily returned (incorrect) data instead of throwing the error.  
> +        If we do permanently this 'except LaunchError' line, then this is
> +    the same logic as at the top of the method--so some of it could
> +    be removed.  Does that make sense or am I missing the 
> +    original thinking behind it?

It's not quite the same, because the second version is operating on the
URL, not the working copy -- from the description, it sounds like that
matters for svn clients in the 1.2 series (I don't have such a beastie
to test with).

As for ignoring errors -- it might be better to find a way to log the
errors in verbose mode, but still ignore them.  The idea is that things
will fail depending on the version of the client.

> -def get_copyfrom(dir):
> +def get_copyfrom(dir, liberal_comparison=False):
>      """Get copyfrom info for a given target (it represents the directory from
>      where it was branched). NOTE: repos root has no copyfrom info. In this case
>      None is returned."""

Is liberal comparison required for this patch, or is it a related but
functionally separate change?  Either way, it needs to be described in
the docstring and tested for.  Similarly, the new return signature of
this function should be described in the docstring.

> +            except AttributeError:
> +                if not liberal_comparison:
> +                    raise AttributeError # todo: how re-raise same one as we caught?

use a bare 'raise'

> +def get_commit_logs_for_rev_range(start, end):
> +    logOutput = launchsvn("log -v -r %d:%d %s" % \
> +                          (start, end, opts["source-url"]))
> +    return logOutput
> +

It would probably be better to roll this into get_commit_log(), with a
different signature (perhaps passing a tuple as "revision" can trigger
this behavior?)

> +def get_adds_and_deletes_and_modifications_from_log(logText):
> +    """ Using the add/delete statements in the log for what is being merged, 
> +    compare them and see which ones can be matched thus are "renames"
> +    For more rationale, see comments in 
> +    check_for_changes_to_preserve_across_merge(start, end).
> +    """
> +    linePattern = re.compile("^\-\-\-")

'-' isn't a special character in a regex, so no need to backslashify it.
You may want to use the string * int trick to get the full width:
    linePattern = re.compile("^" + '-' * 72 + '$')

> +    revisionNumberPattern = re.compile("^r(\d+)")
> +    addFromPattern = re.compile("\s*A (.+) \(from (.+)\d+.+$")
> +    deletedPattern = re.compile("\s*D (.+)$")

The 're' module isn't seeing your backslashes -- prefix the string with
'r' to prevent Python from interpreting the backslashes:
    deletedPattern = re.compile(r"\s*D (.+)$")

> +                        # on windows, python's regexp didn't allow(?) removal 
> +                        # of the version # (":17") off the end with the regular 
> +                        # expression, so do it here.
> +                        addedFromPath = addedFromPath[:addedFromPath.find(':')]  

This should be relatively straightforward with 're' -- maybe it's a
backslash problem?

> +def get_revision_if_earlier_copied_from_merge_source(fileBeingDeletedWorkingCopyName):
> +    """This file may have had changes merged into it from the merge source
> +    previously. If so, find out in which revision that happened, so that
> +    we don't try to get local 
> +    history from before that point--it isn't there, it's on the merge
> +    source, and it isn't the part that is being lost.
> +    """

So let's say that foo.c had this revision history
 r300 - local modification 3
 r280 - local modification 2
 r270 - merge of r255 from source
 r260 - local modification 1
 r250 - merge of r245 from source
 ...
This function would find revision 270, which means that it would miss
the log messages (and diffs?) from r250.

> +def get_nearest_commonancestor_revision(fileBeingDeletedWorkingCopyName,\
> +                                        fileAdded, revisionNumberOfThisRename):
> +    # Find the common ancestor of the two files (one to be deleted and
> +    # one to be added, as the rename does), to see what changed on the
> +    # line that is about to be deleted (since they diverged).
> +    
> +    # We can't (at least not here) use logic like what is at the start
> +    # of the function action_init(), because it wouldn't apply in a case
> +    # where the branch we're working with was not made by copying directly 
> +    # from the 
> +    # current line.  I.e., they both could have been made by copying from
> +    # some other line. We do try to use it in cases when it would work, before
> +    # this current function is called, however.
> +    
> +    # todo: this algorithm IS A KLUDGE (i.e., for determining what revision 
> +    # range we should use on the 
> +    # file to be deleted to create the diff, i.e., what changed
> +    # on this branch in this file, since it and the renamed
> +    # file diverged).  It assumes that any changes on the two branches were never
> +    # checked in during the same commit--in this author's environment that 
> +    # seems like
> +    # a safe assumption, and overall much better than what we had
> +    # before (the destructive renames).  The algorithm assumes that when we 
> +    # find a 
> +    # revision where they were modified on the same commit, they must
> +    # have been the same file at that point in history, therefore we can use 
> +    # that revision # as a starting point.
> +    # The assumption is bad because someone could conceivably have a working copy
> +    # containing both branches, and cd to a directory that contains
> +    # them both and check in changes to both using a single commit
> +    # command).

There was some discussion on the list recently about changes resulting
from conflict-resolution in a merge; that is, r270 above may not be
precise copy of r255.  Will that cause a problem here, too?

> +    # One better way might be to consider using the node revision id (aka 
> +    # "entity id"??)
> +    # as described here http://svn.collab.net/viewvc/svn/trunk/subversion/libsvn_fs_base/notes/structure?view=markup
> +    # starting at "Within the database".  Maybe we could go back in
> +    # history through the revisions for each file, and find the historical
> +    # revision for which they have the same node id.  This however might mean 
> +    # that
> +    # this script would only work with an svn installation that was compiled 
> +    # with python 
> +    # bindings support.

I've been thinking about that -- svnmerge would be a lot more functional
that way, but a lot less compatible.

> +def write_patch_file(diffText, fileAddedStringToEmbed):
> +    fileBeingAddedWorkingCopyNameWithReplacedSlashes = \
> +                                    fileAddedStringToEmbed.replace('/','-')
will this work on windows?

Dustin

-- 
        Dustin J. Mitchell
        Storage Software Engineer, Zmanda, Inc.
        http://www.zmanda.com/



More information about the Svnmerge mailing list