[Svnmerge] Encoding problem in svnmerge commit message file

Raman Gupta rocketraman at fastmail.fm
Thu Dec 6 17:14:31 PST 2007


Thomas Heller wrote:
> Romulo A. Ceccon schrieb:
>> I write log messages for my Subversion repository in Portuguese, so
>> the character encoding matters. The SVN client seems to assume a
>> sensible encoding based on whether the message comes from a file
>> (CP1252) or from the command line (CP850). SVN output is also
>> consistent (CP850). svnmerge, however, does not take those issues
>> into account when generating the log message. The message is taken
>> from the SVN client output (CP850) and written directly to
>> svnmerge-commit-message.txt. When I use the command "svn commit -F
>> svnmerge-commit-message.txt" SVN thinks the encoding is CP1252 and
>> ends up writing a misencoded log message to the repository. To
>> workaround the problem I must remember to use the "--encoding IBM850"
>> option.
> 
> While I support the use of ctypes sa much as possible ;-), would
> not this approach be better?  It might even work on linuxes...
> 
> Thomas
> 
> +def convert_oem(s):
> +    u = s.decode(sys.stdout.encoding)
> +    return u.encode(sys.getfilesystemencoding())
> +

Thomas, why did you use sys.getfilesystemencoding()? Based on the python 
docs, getfilesystemencoding() seems to be the encoding used to read 
filenames?

Basically, what we need to be sure about is which encoding svn commit 
defaults to when it reads log message files, and which python method 
returns that same encoding on every platform.

I guess we should also confirm that python sys.stdout.encoding always 
corresponds to the encoding of the svn log output, across all platforms.

At least on my linux box, this change should not cause any issues 
because sys.stdout.encoding and sys.getfilesystemencoding() are both 
UTF-8, so in essence this additional code should be a no-op (although 
perhaps before doing the encoding/decoding the patch could check for the 
input and output encodings being different):

$ python
Python 2.4.4 (#1, Oct 23 2006, 13:58:18)
[...]
 >>> import sys
 >>> print sys.stdout.encoding
UTF-8
 >>> print sys.getfilesystemencoding()
UTF-8

Cheers,
Raman




More information about the Svnmerge mailing list