[Svnmerge] [Fwd: Merge tracking proposal]

Fri Apr 28 16:13:11 PDT 2006

FWIW, when the dust settles from discussion on the Subversion dev
list, I'll be integrating this information into the docs:

  http://subversion.tigris.org/merge-tracking/

On Fri, 28 Apr 2006, Blair Zajac wrote:

> There's a merge tracking proposal put up by Daniel Berlin here:
> 
> http://svn.haxx.se/dev/archive-2006-04/0916.shtml
> 
> Blair
> 
> -- 
> Blair Zajac, Ph.D.
> <blair at orcaware.com>
> Subversion training, consulting and support
> http://www.orcaware.com/svn/

> From: Daniel Berlin <dberlin at dberlin.org>
> To: dev <dev at subversion.tigris.org>
> Date: Fri, 28 Apr 2006 09:34:04 -0700
> Subject: Merge tracking proposal
> 
> Among other things I am working on at Google, I have been tasked
> full-time with implementing merge tracking.
> 
> As part of this, I have come up with a design I plan on implementing for
> tracking what revisions have been merged where, in a manner that is
> suitable for use by various other operations (history sensitive merging,
> etc).
> 
> In doing so, I reviewed the use cases that were kindly written up, and
> believe that most if not all of them can be accomplished with this
> design.
> 
> Please remember that this design is *only* for tracking what changes are
> merged where.  I expect this to be the easy part, compared to deciding
> exactly what algorithms our history sensitive merge uses, and how it
> proceeds.
> 
> I have divided the design into four portions "Goals", "information
> storage", "information updating", "other prereqs to being able to
> implement the design".
> 
> The "random questions and answers" section is there to answer common
> questions other developers I've talked to while coming up with this
> design have had, in the hopes that it will answer some common queries
> the list may have.
> 
> Goals:
> 
> The overarching goal here is to track the revision numbers being merged
> by a merge operation, and keeping this information in the right places
> as various operations (copy, delete, add, etc) are performed.
> 
> The goals of this design are:
> 1. To be able to track this down to what files in a working copy and be
> able to determine what files have had what revisions merged into them.
> 
> 2. To not need to contact the server more than we already do now to
> determine which revisions have been merged in a file or directory (ie
> some contact is acceptable, asking the server about each file is not).
> 
> 3. To be able to edit merge information in a human editable form.
> 
> 4. For the information to be stored in a space efficient manner, and to
> be able to determine the revisions merged into a given file/director in
> a time efficient manner.
> 
> 5. Still getting a conservatively correct answer (not worse than what we
> have now) when no merge info is specified.
> 
> 6. To be able to collect, transmit, and keep this information up to date
> as much as possible on the client side.
> 
> 7. To be able to index this information in the future order to answer
> queries
> 
> Specific Non-goals for *this design* include:
> 1. Doing actual history sensitive merging
> 2. Curing cancer (aka being all things to all people)
> 
> When reading the design presented here, please remember that it is
> impossible to get something perfect in subversion on the first try, and
> attempting to nit pick this to death will not actually help anything,
> but it would be very annoying.  This is not to dissuade people from
> suggesting design changes, but if you plan on suggesting a different
> revision list format because you believe colon doesn't have a good level
> of synergy with existing separators, or something, you may want to
> rethink whether it really matters.
> 
> Some pre-notes:
> The one argument i continually have with myself is whether to store info
> in revprops, or just on dirs and files. If you want to try to
> convincingly argue one way or the other, go for it. Certainly, I think
> it makes certain semantics clearer on what operations do below and how
> to proceed easier, the question is whether it is efficient enough time
> wise when we go to retrieve merge info, and whether it complicates what
> merge has to do too much.  It also removes all of the listed
> pre-reqs :).
> 
> One could also try to argue that we should start with exactly the same
> cases svnmerge does (IE only allow merge info at the wc roots, only
> store it on that directory, etc), with a nicer integrated interface, and
> try to expand it from there. I am open to such an argument as well. :)
> 
> Anyway, on with the design.
> 
> 
> Information storage
> 
> The first question that many people ask is "where should we store the
> merge information" (what we store will be covered next).
> 
> After a large amount of research, the design I have come up with is
> this:
> A merge info property, named SVN_MERGE_PROPERTY (not the real name, I
> have made it a constant so we can have a large bikeshed about what to
> really call it) stored in the revision properties, directory properties,
> and file properties.
> Each will store the *full, complete* list of current merged in changes,
> as far as it knows.  This ensures that the merge algorithm and other
> consumers do not have to walk back revisions in order to get the
> transitive closure of the revision list.
> 
> The way we choose which of file, dir, revprop merge info to use in case
> of conflicts simple system of inheritance[1] where the "most specific"
> place wins.  This means that if the property is set on a file, that
> completely overrides the directory and revision level properties.
> 
> The way we choose which to store to depends on how much and where you
> merge, and will be covered in the semantics.
> 
> The reasoning for this system is to avoid having to either copy info
> everywhere, or crawl everywhere, in order to determine which revisions
> have been applied.  At the same time, we want to be space and time
> efficient, so we can't just store the entire revision list everywhere.
> 
> As for what is stored:
> 
> For the large number of people i have talked to and heard about from
> others, it seems the human editable *format* of how svnmerge stores
> merge information (IE pathname and list of revisions) is fine.  Binary
> storage of such information would buy, on average, a 2-3 byte decrease
> per revision/range in size over ascii[1], while making it not directly
> human editable.
> 
> As such, i have chosen to represent the revisions we have merged *into*
> something as a path, a colon, and then a comma separated revision list,
> containing one or more revision or  revision ranges.  Revision range end
> and beginning points are separated by "-".
> 
> So the grammar looks something like this
> 
> revisionrange -> REVISION "-" REVISION
> 
> revisioneelement -> revisionrange | REVISION
> 
> revisionlist -> (revisionrange | REVISION)(COMMA revisioneelement)*
> 
> revisionline -> PATHNAME COLON revisionlist
> 
> top -> revisionline (NEWLINE revisionline)*
> 
> This list will *not* be stored in a canonicalized minimal form for a
> path (IE it may contain single revision numbers that could be ranges).
> This is chiefly because the benefit of such a canonical format (slightly
> easier *comparison*, but not indexing) is heavily outweighed by the fact
> that generating a canonical form may require groveling through a lot of
> information to determine what that minimal canonical form is.  In
> particular, it may be that the revision list "5,7,9" is, in minimal
> canonical form, "5-9", because 6 and 8 do not have any affect on the
> pathname that 5 and 9 are from.
> Canonicalization could be done as a server side post pass because the
> information is stored in properties.
> 
> Note that this revision format will not scale on its own if you have a
> list of million revisions.  None will easily.  However, because it is
> stored in properties, one can change the wc and fs backends to simply do
> something different with this single property if they wanted to.
> Given the rates of change of various very active repositories, this will
> not be a problem we need to solve for many many years.
> 
> Information updating:
> Each operation you can perform may update or copy the merge info
> associated with a path, file, or revision.
> 
> 
> svn add:  No change to merge info
> svn delete: No direct change to merge info (indirectly, because the
> props go away, so does the merge info for the file)
> svn rename: No change to merge info
> svn copy: Copies the merge info from the source path to the destination
> path, if any.
> 
> This includes copying info from revprops, if necessary, by determining
> if the merge info exists in a revprop for the last changed commit for
> the source path, and copying it to the new revprop if it does (someone
> probably needs to check if this is the right semantic :P)
> 
> All copies are full-copies of the merge information.
> 
> svn merge: Adds or subtracts to the merge info, according to the
> following:
> 
> Where to put the info:
> 1. If the merge target is a single file, the merge info goes to the
> property SVN_MERGE_INFO set on that file.
> 2. If the merge target is a non-wc-root directory, the merge info goes
> to the property SVN_MERGE_INFO set on the directory
> 3. If the merge target is a wc-root directory, the merge info goes to
> the property SVN_MERGE_INFO set on the revprop.
> 
> What info is put:
> 1. If you are merging in reverse, revisions are subtracted from the
> revision lines, but we never write out anti-revisions.  Thus, if you
> subtract all the merged revisions, you just get an empty list, and if
> you do a reverse merge from there, you still get an empty list
> 2. If you are merging forward, the revision(s) you are merging is added
> to the revision line in sorted order (such that all revisions and
> revision ranges in the list are monotonically increasing from left to
> right).  The exact details of how the range is represented in terms of a
> list of single revs, or a revision range, is left as a quality of
> implementation detail.  The only requirement is that the range be
> correct.
> 3. The path (known as PATHNAME in the grammar) used as the key to
> determine which revision line to change is the subdirectory path being
> merged from, relative to the repo root, with the repo url stripped from
> it.
> 
> Thus a merge of revisions 1-9 from http://foo.bar.com/reposroot/trunk
> would produce "/trunk:1-9"
> 
> cross-repo merging is a bridge we can cross if we ever get there :).
> 
> 
> pre-reqs for this design:
> 
> 1. Need to be able to set a revprop to be stored on commit
> 2. Need to be able to say to copy a revprop from a particular revision
> and only contact the server at commit time.
> 
> 2. Need to be able to have auth treat SVN_MERGE_PROPERTY revprop
> differently from other revprops (either by special casing the cases
> users do care about controlling, or special casing props users don't
> care about controlling, etc) so that people who don't have access to the
> revprops can still do history sensitive merges of directories they do
> have access to.
> 
> 
> Random questions and answers
> 
> What happens if someone commits a merge with a non-merge tracking
> client?
> It simply means the next time you merge, you may receive conflicts that
> you would have received if you were using a non-history-sensitive
> client.
> 
> Can we do without the revprop portion of this design?
> Technically yes, AFAIK, but it may require more crawling and querying at
> merge time.
> 
> Can we do history sensitive wc<->wc merges without contacting the serve?
> No. But you probably couldn't anyway, even if the revprop not being
> stored locally issue were not here.
> 
> What happens if the info is not there?
> The same thing that happens if the info is not there now.
> 
> What happens if a user edits merge info incorrectly?
> They get the results specified by their merge info.
> 
> How does the revprop stay up to date?
> We copy it from revision to revision.
> 
> What happens if a user manually edits a file and unmerges a revision (IE
> not using a "reverse merge" command), but doesn't update the merge info
> to match?
> The merge info will believe the change has still been merged.
> 
> What happens if i svn move/rename a directory, and then merge it
> somewhere?
> This doesn't change history, only the future, thus we will simply add
> the merge info for that directory as if it was a new directory.  We will
> not do something like attempt to modify all merge info to specify the
> new directory, as that would be wrong.
> 
> I don't think only that copying info on svn copy is correct, what if you
> copy a dir with merge info into a dir where the dir has merge info,
> won't it get the info wrong now?
> 
> No.  
> 
> Let's say you have
> 
> a/foo (merge info: /trunk:5-9
> a/branches/bar (merge info: /trunk:1-4)
> 
> If you copy a/foo into a/branches/bar, we now have
> 
> a/branches/bar (merge info: /trunk:1-4)
> a/branches/bar/foo (merge info: /trunk:5-9)
> 
> This is strictly correct.  The only changes which have been merged into
> a/branches/bar/foo, are still 5-9.  The only changes which have been
> merged into /branches/bar are 1-4.  No merges have been performed by
> your copy, only copies have been performed.  If you perform a merge of
> revisions 1-9 into bar, the results one would expect that the history
> sensitive merge algorithm will skip revisions 5-9 for
> a/branches/bar/foo, and skip revisions 1-4 for a/branches/bar.
> The above information gives the algorithm the information necessary to
> do this.
> 
> So if you want to argue svn copy has the wrong merge info semantics,
> it's not because of the above, AFAIK :)
> 
> 
> I'm sure that even in this long document, I've forgotten some things i
> did spec out.
> Apologies in advance.
> 
> 
> Footnotes:
> [1] This is not going to be a full blown design for property
> inheritance, nor should this design depend on such a system being
> implemented.
> 
> [2] Assuming 4 byte revision numbers, and repos with revisions numbering
> in the hundreds of thousands.  You could do slightly better by variable
> length encoding of integers, but even that will generally be 4 bytes for
> hundreds of thousands of revs.  Thus, we have strings like "102341" vs 4
> byte numbers, meaning you save about 2 bytes for a 4 byte integer.
> Range lists in binary would need a distinguisher from single revisions,
> adding a single bit to both (meaning you'd get 31 bit integers), and
> thus, would require 8 bytes per range vs 12 bytes per range.  While 30%
> is normally nothing to sneeze at space wise, it's also not significantly
> more efficient in time, as most of the time will not be spent parsing
> revision lists, but doing something with them. The space efficiency
> therefore does not seem to justify the cost you pay in not making them
> easily editable.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe at subversion.tigris.org
> For additional commands, e-mail: dev-help at subversion.tigris.org
> 

> _______________________________________________
> Svnmerge mailing list
> Svnmerge at orcaware.com
> http://www.orcaware.com/mailman/listinfo/svnmerge

-- 

Daniel Rall
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : /pipermail/svnmerge/attachments/20060428/ad80e32e/attachment.pgp