6

I am trying to commit a revision with subversion on cmd.exe. The cmd.exe's codepage is utf-8 (set with chcp 65001):

c:\path\to\work\dir> svn ci

Since I have not specified a message with the -m flag, and the variable SVN_EDITOR is set to gvim, gvim opens and I can enter my message. I save the file as utf-8 (:set filencoding=utf8) and quit the editor.

Now, the svn client (?) tells me: Auf ... .folgte ein nicht-ASCII Byte 195, das nicht von/nach UTF-8 konvertiert werden konnte (which I believe in English to be: Non-ASCII character (code %d) detected, and unable to convert to/from UTF-8).

This is strange since I am quite sure that the message file I stored is in UTF-8 format.

I also tried storing it in latin-1, but with the same effect.

Edit

I did a test with the message ü. The hex content of the file is

0000000: c3bc 0d0a 2d2d 2044 6965 7365 2075 6e64  ....-- Diese und
0000010: 2064 6965 2066 6f6c 6765 6e64 656e 205a   die folgenden Z
0000020: 6569 6c65 6e20 7765 7264 656e 2069 676e  eilen werden ign
0000030: 6f72 6965 7274 202d 2d0d 0a0d 0a41 2020  oriert --....A
0000040: 2020 780d 0a                               x..

Note the first for characters (ü followed by \x0d\x0a). The ü is encoded as c3 bc which is the utf-8 representation for LATIN SMALL LETTER U WITH DIAERESIS (see utf 8 table) which is the desired ü.

Note also, that the error message (in this new case: Ein Nicht-ASCII Zeichen (Kode 195) wurde gefunden, das nicht von/nach UTF-8 konvertiert werden konnte) complains about 195 (which is decimal for c3, the very first byte in the file). Of course, the error message is right: it is no ASCII character, but is this not the whole point of using utf-8 files?

Edit 2

I tried to commit the message in UTF-8 format because this was the what I believed to be most natural thing. Obviously, SVN, at least on cmd.exe, doesn't think so. I couldn't care less what format I need to commit the message in, as long as I can commit an ü and other german special characters.

René Nyffenegger
  • 39,402
  • 33
  • 158
  • 293

4 Answers4

6

It looks like the svn commit command actually accepts an argument to tell SVN what encoding your commit message is in. Try svn commit --encoding UTF-8.

http://svnbook.red-bean.com/en/1.7/svn.ref.svn.html says:

--encoding ENC

Tells Subversion that your commit message is composed using the character encoding provided. The default character encoding is derived from your operating system's native locale; use this option if your commit message is composed using any other encoding.

Ben
  • 8,725
  • 1
  • 30
  • 48
0

I don't know if it will work, but you can try also using :set bomb in your gvim to include a BOM in the file when you save. Some programs use a BOM to detect that they should use Unicode. I'm not sure if SVN falls into that category or not.

Ben
  • 8,725
  • 1
  • 30
  • 48
  • This does not work: the *BOM* is `0xef 0xbb` (Decimal 239 187). SVN doesn't manage to read the very first character 239 (`Ein Nicht-ASCII Zeichen (Kode 239) wurde gefunden, das nicht von/nach UTF-8 konvertiert werden konnte`) – René Nyffenegger Nov 08 '13 at 06:38
  • That's a shame. It was worth a shot I guess. – Ben Nov 10 '13 at 04:09
0

you can try iconv for Windows:

File before conversion:

ü
-- Diese und die folgenden Zeilen werden ignoriert --

hexdump:

00000000  c3 bc 0d 0a 2d 2d 20 44  69 65 73 65 20 75 6e 64  |ü..-- Diese und|
00000010  20 64 69 65 20 66 6f 6c  67 65 6e 64 65 6e 20 5a  | die folgenden Z|
00000020  65 69 6c 65 6e 20 77 65  72 64 65 6e 20 69 67 6e  |eilen werden ign|
00000030  6f 72 69 65 72 74 20 2d  2d 0d 0a                 |oriert --..|

conversion command:

<utf8.txt iconv -f utf-8 -t 850>ascii.txt

result:

ü
-- Diese und die folgenden Zeilen werden ignoriert --

hexdump:

00000000  81 0d 0a 2d 2d 20 44 69  65 73 65 20 75 6e 64 20  |...-- Diese und |
00000010  64 69 65 20 66 6f 6c 67  65 6e 64 65 6e 20 5a 65  |die folgenden Ze|
00000020  69 6c 65 6e 20 77 65 72  64 65 6e 20 69 67 6e 6f  |ilen werden igno|
00000030  72 69 65 72 74 20 2d 2d  0d 0a                    |riert --..|

Codepage was always 850.

Endoro
  • 37,015
  • 8
  • 50
  • 63
0

add to .bashrc (or similiar)

export LANG="de_DE.utf8"
export LANGUAGE="de_DE.utf8"
export LC_ALL="de_DE.utf8"

svn uses encoding defined in enviroment

Tomasz Brzezina
  • 1,452
  • 5
  • 21
  • 44