[Svnmerge] Unicode in log messages

Raman Gupta rocketraman at fastmail.fm
Fri Oct 9 12:35:37 PDT 2009


Benson Margulies wrote:
> Raman,
> 
> I messed up by not rereading the code before writing that message. Yes,
> of course, there's a decode. But it's a no-op, since sys.stdout.encoding
> is UTF-8 on the machines I have access to.

It is on my Linux box too, but Romulo Ceccon who originally reported
this issue had a Windows box that returned a different value. I don't
remember what it was, but its in the list archives.

> sys.stdout.encoding is UTF-8 for me.
> 
> /Users/benson/x/verint/rex-ws/target python                   
> Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
> [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import sys;
>>>> print sys.stdout.encoding;
> UTF-8
>>>>
>>>> import locale;
>>>> print locale.getdefaultlocale()[1];
> 
> Here's the patch that works fine for me. The critical change is to avoid
> local.getdefaultlocale() if you want to preserve the
> 
> def recode_stdout_to_file(s):
>     if locale.getdefaultlocale()[1] is None or not hasattr(sys.stdout,
> "encoding") \
>             or sys.stdout.encoding is None:
>         return s
>     u = s.decode(sys.stdout.encoding)
>     #return u.encode(locale.getdefaultlocale()[1])
>     return u.encode("utf-8")
> 
> Since svn is not a python program, it is not obvious to me how
> sys.stdout.encoding is related to how it writes. Practically, it seems
> to write UTF-8, and then the rest of this works for me.

Ok, in your patch you are changing the coding that svnmerge is using
to write the commit log message to file explicitly to UTF-8, rather
than the coding svnmerge is using to read the output from svn (so
sys.stdout.encoding is irrelevant).

svn itself, as far as I know, does not assume UTF-8 when reading this
file, therefore neither can svnmerge assume UTF-8 when writing it (as
your patch does).

Rather, I believe, based on our testing last year and some
conversations on #svn, that svn reads the file using the encoding
returned by the C equivalent of this python snippet:

import locale, os, sys
locale.setlocale(locale.LC_ALL, '')
if os.environ.has_key("LC_ALL"):
     del os.environ["LC_ALL"]
print locale.getdefaultlocale()[1]

What does this return on your machine? On my machine, this also
returns UTF-8, effectively equivalent to your patch:

Python 2.5.2 (r252:60911, Sep 30 2008, 15:42:03)
[GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import locale, os, sys
>>> locale.setlocale(locale.LC_ALL, '')
'en_US.UTF-8'
>>> if os.environ.has_key("LC_ALL"):
...      del os.environ["LC_ALL"]
...
>>> print locale.getdefaultlocale()[1]
UTF8

Cheers,
Raman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3528 bytes
Desc: S/MIME Cryptographic Signature
URL: </pipermail/svnmerge/attachments/20091009/64d2abce/attachment.bin>


More information about the Svnmerge mailing list