[Svnmerge] Unicode in log messages

Fri Oct 9 14:00:45 PDT 2009

oops, bad paste ...
>>> import locale;
>>> print locale.getdefaultlocale()[1];
mac-roman

On Fri, Oct 9, 2009 at 3:00 PM, Raman Gupta <rocketraman at fastmail.fm> wrote:

> Please keep replies on list...
>
> Benson Margulies wrote:
> > The point is that it only uses the encoding to write the file. It reads
> > the bytes from the log raw, and pushes them into the codec to write them
> > into the file. Thus, it is assuming that the input is UTF-8, and asking
> > for the output to be in the default locale. That's how the codecs work.
> > It isn't using a codec to convert from input, only to convert the output.
>
> I'm sorry Benson, but I believe you are operating under some
> fundamental misconceptions... Of course it has to use a codec to
> convert from input ("input" here is the svn log output).
>
> Any time one reads bytes that one knows are characters (as output by
> svn log), one needs to apply a codec to the bytes to understand what
> those characters are. You contradict yourself by saying that it is
> assuming the input is UTF-8 -- UTF-8 is just another codec, no
> different from other codecs except in the actual byte value(s) used to
> represent characters. Assuming UTF-8 would indeed mean using a codec
> to decode the input.
>
> Here is what it is really doing:
>
> def recode_stdout_to_file(s):
>    [... if statement snipped ...]
>    u = s.decode(sys.stdout.encoding)
>    return u.encode(locale.getdefaultlocale()[1])
>
> i.e. svnmerge.py is decoding the bytes of the svn log output using the
> codec returned by sys.stdout.encoding. This may be UTF-8, but it may
> be something else depending on your local platform and settings. There
> is *no assumption* of UTF-8 here. Then it is encoding those characters
> back into bytes (and eventually writing these bytes to a file), using
> the codec returned by locale.getdefaultlocale()[1]. This encoding is
> what svn expects in the content of files that it reads commit log
> messages from via the -F parameter.
>
> The possible error here is that our assumption of what encoding svn
> uses when printing a log to stdout (i.e. sys.stdout.encoding) or what
> encoding svn uses when reading a commit log file for creating a commit
> message (i.e. locale.getdefaultlocale()[1]) is wrong. If either of
> these assumptions is wrong, then yes, there is a problem that needs to
> be fixed. It has nothing to do with "assuming" UTF-8.
>
> > And this makes sense. It's completely wrong to assume that the svn log
> > messages are in the current user's default locale locale encoding. It
> > makes some sense that users would want to edit a file in their current
> > encoding, it just doesn't always work.
>
> Huh? Do you have some evidence that svn, when writing a commit log to
> standard output, does not write the data in the encoding specified by
> the python sys.stdout.encoding value? If so, great -- please provide
> such evidence and a patch with your fix.
>
> Cheers,
> Raman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/svnmerge/attachments/20091009/14056688/attachment.html>