[Svnmerge] Fwd: [Issue 2869] multiple pathid formats

Mon Feb 16 16:26:51 PST 2009

On lun, 2009-02-16 at 10:01 -0500, Dustin J. Mitchell wrote:
> On Mon, Feb 16, 2009 at 8:05 AM, Giovanni Bajo <rasky at develer.com>
> wrote:
> > Where is the latest patch? Attached to the issue tracker?
> 
> Yes:
> 
> http://subversion.tigris.org/nonav/issues/showattachment.cgi/971/locid-final.patch
> 
> Thanks!

Comments:

> 
> + def __eq__(self, other):
> + return self is other
> +
> + def __ne__(self, other):
> +        return self is not other
> +
> + def __hash__(self):
> +        return id(self)
> 

This is the default.

> 
> +        # convert pathid_str to a PathIdentifier
> +        if not locobjs.has_key(pathid_str):
> +            if is_url(pathid_str):
> +                # we can determine every form; pathid_hint knows how to do that
> +                pathid_hint(pathid_str)
> +            elif pathid_str[:7] == 'uuid://':
> +                mo = re.match('uuid://([^/]*)(.*)', pathid_str)
> +                if not mo:
> +                    error("Invalid path identifier '%s'" % pathid_str)
> +                uuid, repo_relative_path = mo.groups()
> +                pathid = PathIdentifier(repo_relative_path, uuid=uuid)
> +                # we can cache this by uuid:// pathid and by repo-relative path
> +                locobjs[pathid_str] = locobjs[repo_relative_path] = pathid
> +            elif pathid_str and pathid_str[0] == '/':
> +                # strip any trailing slashes
> +                pathid_str = pathid_str.rstrip('/')
> +                pathid = PathIdentifier(repo_relative_path=pathid_str)
> +                # we can only cache this by repo-relative path
> +                locobjs[pathid_str] = pathid
> +            else:
> +                error("Invalid path identifier '%s'" % pathid_str)

This would be a perfect PathIdentifier.create() staticmethod, AFAICT. By
moving the code there, you're uniting all the logic about
formatting/parsing of a PathIdentifier string representation within the
class itself.

> -def is_url(url):
> +def is_url(url, check_valid=False):
>      """Check if url is a valid url."""
> -    return re.search(r"^[a-zA-Z][-+\.\w]*://[^\s]+$", url) is not None
> +    if re.search(r"^[a-zA-Z][-+\.\w]*://[^\s]+$", url) is not None and url[:4] != 'uuid':
> +        if not check_valid: return True
> +        return get_svninfo(url, fail_on_invalid=False) != {}
> +    return False
> 

I would prefer if is_url() would stay a purely syntactic function, just
as is_pathid() and is_wc(). Instead of this change, I would add another
function called check_url() (or check_valid_url()), so that you can
replace later occurrences of:

> +            if is_url(cf_url, check_valid=True):
> +                cf_pathid = target_to_pathid(cf_url)

into:

   if is_url(cf_url) and check_url(cf_url):
      [...]

> -    for L in launchsvn('info "%s"' % target):
> +    lines = launchsvn('info "%s"' % target)
> +    if not lines: lines = [ "x: (Not a valid URL)" ]

Why you need this special case?

> 
> -        info[key] = value.strip()
> +        value = value.strip()
> +        if value == '(Not a valid URL)':
> +            if fail_on_invalid:
> +                error("Not a valid URL: %s" % target)
> +            return {} # no info on this URL
> +        info[key] = value

Hmm... I don't this this whole modification to get_svninfo() is worth
it. It looks like your only goal was to implement the URL validity
check. In that case, it's easy even without all this code:

def check_url(url):
    return "URL" in get_svninfo(url)

Notice that this how this is already done: if you look at other usages
of get_svninfo(), you will see that missing items from the dictionary is
sometimes interpreted as a "bad target" signal.

> +def pathid_hint(target):
> +    """Cache some information about target, as it may be referenced by
> +    repo-relative path in subversion properties; the cache can help to
> +    expand such a relative path to a full path identifier."""
> +    if locobjs.has_key(target): return
> +    if not is_url(target) and not is_wc(target): return

What about putting all this code (pathid_hint, target_to_pathid,
locobjs, repo_hints) within the PathIdentifier class?

Eg: pathid_hint -> PathIdentifier.hint(), target_to_pathid ->
staticmetod PathIdentifier.from_target() (factory). The caches could be
moved within the class objects (PathIdentifier.locobjs, ecc.). Or
something like that...

I think it would make the code even easier to read.

> -def check_old_prop_version(branch_target, branch_props):
> -    """Check if branch_props (of branch_target) are svnmerge properties in
> -    old format, and emit an error if so."""
> -
> -    # Previous svnmerge versions allowed trailing /'s in the repository
> -    # local path.  Newer versions of svnmerge will trim trailing /'s
> -    # appearing in the command line, so if there are any properties with
> -    # trailing /'s, they will not be properly matched later on, so require
> -    # the user to change them now.
> -    fixed = {}
> -    changed = False
> -    for source, revs in branch_props.items():
> -        src = rstrip(source, "/")
> -        fixed[src] = revs
> -        if src != source:
> -            changed = True
> -
> -    if changed:
> -        err_msg = "old property values detected; an upgrade is required.\n\n"
> -        err_msg += "Please execute and commit these changes to upgrade:\n\n"
> -        err_msg += 'svn propset "%s" "%s" "%s"' % \
> -                   (opts["prop"], format_merge_props(fixed), branch_target)
> -        error(err_msg)
> -
> 

If I understand correctly your whole patch, you're removing this code
because, after your patch, svnmerge.py will be again able to cope with
this old format. Is this correct?

> +        OptionArg("-L", "--location-type",
> +               dest="location-type",
> +               default="path",
> +               help="Use this type of location identifier in the new " +
> +                    "Subversion properties; 'uuid', 'url', or 'path' " +
> +                    "(default)"),

Why "path" is the default? Is this only some kind of backward
compatibility default? I would say uuid is the best choice here, isn't
it?

The patch looks great altogether. It's a great cleanup and also
introduces a valuable feature for cross-repo merges. Thanks!
-- 
Giovanni Bajo
Develer S.r.l.
http://www.develer.com