[Svnmerge] request for review: #2863 - analyze_source_revs fix

Sun Aug 5 16:58:07 PDT 2007

On Mon, Aug 06, 2007 at 01:13:35AM +0200, Giovanni Bajo wrote:
> It is perfectly correct if you run merges within a single repository,
> whcih is the only "documented" way to do merges. I appreciate your
> efforts to fully support merges across multiple repositories, but I am
> a little concerned if that slows down operations within a single
> repository (which I am sure it is by far the most common type of merge
> whcih svnmerge.py is used for).
> 
> >  Is there another way to do this?  I think using get_svninfo() on
> >  them to compare UUIDs or repos-roots would cause an extra fetch
> >  anyway.
> 
> Not necessarily: svn info of a local path does not fetch anything.
> Pardon me if I haven't looked at the code for a long time, but *on
> paper* I can't see why a remote fetch should be necessary here. 

I think that, before we can address this specific case, we need to
address the following question: does svnmerge support inter-repository
merges?  If no, then I should withdraw this patch series.  If yes, then
we have some bugs to fix.  I don't think we can have any ambiguity here.

Inter-repository merges work, modulo this bug and the inability to merge
between projects with equal repos-relative paths (which this patch
series intends to fix).

Folks who are doing inter-repository merges, please speak up!

> > analyze_revs has some unused logic to handle the case of an unknown
> > end_rev.  Could we make that an option for speed-conscious users?
> 
> I don't understand what you are suggesting here... can you elaborate
> please?

The code in question is to determine end_rev for supply to analyze_revs.
It's the only code to call analyze_revs, and always supplies end_rev.
But analyze_rev contains conditional code to deal with the case where
end_rev is unknown.  My point was that end_rev is not strictly required,
so one possibility is to drop it entirely.  Other possibilities:
 - use the target HEAD on single-repository merges and the source head
   on multi-repository merges, either with an explicit test in
   analyze_source_revs or by improving the caching in get_latest_rev()
 - offer a --fast argument that disables remote accesses that are not
   strictly necessary (such as this one)

> > One I've been thinking about is caching immutable information such
> > as RevisionLog information -- I find that the biggest time sink for
> > my merges (which are over svn+ssh://, ugh) comes from downloading
> > the same logs and revision diffs on every merge.
> 
> Yes, but I wouldn't put that within svnmerge.py. Caching part of the
> remote repository into some local space is probably a project on its
> own; I had played with such a toy before. I had design a tool called
> "lsvn" which would basically have the same UI of "svn" (forwarding
> every command), but cache many things locally (not file
> contents/diffs, but logs and revprops). After that, you can simply
> tell svnmerge.py to run "lsvn" instead of "svn" and be done with it.
> In fact, I guess many users of "svn" would be happy of "lsvn"
> independently of svnmerge.py.

I'm not entirely convinced: svnmerge is already caching information
*within* a run of svnmerge.  I'm only suggesting that (perhaps
optionally) that information gets written out to a pickle somewhere to
preserve it between runs.  I think that would drastically speed up most
users' svnmerge experiences -- although it wouldn't fix this particular
issue.

Dustin

-- 
        Dustin J. Mitchell
        Storage Software Engineer, Zmanda, Inc.
        http://www.zmanda.com/