1

I got my CVS database transformed into SVN with the cvs2svn tool, but all my unicode text files were changed into UFT-8, and I don't want that.

How can I avoid that? Is there a flag or parameter to keep my Unicode files?

dda
  • 6,030
  • 2
  • 25
  • 34
Dedanan
  • 143
  • 2
  • 9
  • 1
    You realize that Unicode is not an encoding, and that UTF-8 is part of Unicode? – dda Jun 05 '13 at 15:55

2 Answers2

2

I assume that what you mistakenly refer to as Unicode is UTF-16LE. There is an option in cvs2svn, and it's in the documentation:

--encoding=ENC

Use ENC as the encoding for filenames, log messages, and author names in the CVS repos. (By using an --options file, it is possible to specify one set of encodings to use for filenames and a second set for log messages and author names.) This option may be specified multiple times, in which case the encodings are tried in order until one succeeds. Default: ascii. Other possible values include the standard Python encodings.

So you could try passing --encoding=utf_16_le to the command line.

dda
  • 6,030
  • 2
  • 25
  • 34
1

The encoding Windows (misleadingly) refers to as "Unicode" is UTF-16LE. This is a troublesome encoding because it is not ASCII-compatible; Windows adopted it because at the time (before UTF-8 was invented) it was expected to be the most common encoding for Unicode text. Today UTF-8 is overwhelmingly the preferred encoding for in-file Unicode storage.

Whilst dda's answer should probably work (+1), Subversion does not support handling UTF-16 files as text - they'll be handled as binary files which means you won't get usable diff/patch/merge. For this reason I would strongly recommend letting cvs2svn go ahead and change the files to UTF-8.

bobince
  • 528,062
  • 107
  • 651
  • 834