1

Currently I have a local Subversion repository with a lot of commit messages in cp1251 encoding.

Is there any way I can convert all commit messages into utf-8 encoding?

Regent
  • 5,502
  • 3
  • 33
  • 59

2 Answers2

2

As Rup-8 says, subversion should convert all log messages to UTF-8 before storing them in the repository, and back to the local encoding for display. If your log messages aren't being converted correctly, either:

  • Make sure your locale setting correctly identifies the encoding you're using; or,
  • Use the --message-encoding option
mazaneicha
  • 8,794
  • 4
  • 33
  • 52
  • Looks like it is actually a problem with one of the console tools not treating UTF-8 correctly rather than with SVN repository itself. – Regent Apr 12 '11 at 09:09
1

Your commit messasges are already stored as UTF-8:

Subversion internally handles certain bits of data—for example, property names, pathnames, and log messages—as UTF-8-encoded Unicode. This is not to say that all your interactions with Subversion must involve UTF-8, though. As a general rule, Subversion clients will gracefully and transparently handle conversions between UTF-8 and the encoding system in use on your computer, if such a conversion can meaningfully be done (which is the case for most common encodings in use today).

If you've somehow double-encoded them, though, then assuming you're using an FSFS-style repository the easiest way will probably be to work through all the revprop files that you find in db/revprops/*/* underneath your repository and re-write them with the correct encoding, e.g. using the iconv command-line tool from GnuWin32. (Note that these files should have Unix line endings i.e. LF not CRLF).

Rup
  • 33,765
  • 9
  • 83
  • 112
  • I was looking into `revs` instead of `revprops` directory where log messages are indeed stored as UTF-8. Looks like the problem I'm having is actually with one of the console tool not working with UTF-8 correctly. – Regent Apr 12 '11 at 09:08
  • For input of output? If input, then you probably want `--encoding` as mazaneicha says; for output you might do better with the `--xml` output which is (supposed to be!) always UTF-8. – Rup Apr 12 '11 at 09:23