1

ID3_V1 supports only latin1 encoding. In order to write V1 tags with Russian characters, cp1251 encoding is used. I would like to copy data from V2 tags (unicode) to V1 tags. I get V2 tags with the following code with eyeD3 usage:

tag.link(mp3path, v=eyeD3.ID3_V2)
mp3album_v2 = tag.getAlbum()
...
tag.link(mp3path, v=eyeD3.ID3_V1)
tag.setTextEncoding(eyeD3.LATIN1_ENCODING)
tag.setAlbum(mp3album_v2.encode('cp1251')) # ???
tag.update()

The following is returned:

>>> print mp3album_v2
Жить в твоей голове

>>> print type(mp3album_v2)
<type 'unicode'>

>>> print repr(mp3album_v2)
u'\u0416\u0438\u0442\u044c \u0432 \u0442\u0432\u043e\u0435\u0439 \u0433\u043e\u043b\u043e\u0432\u0435'

Looks like setAlbum expects utf-8 string (?):

def setAlbum(self, a):
    self.setTextFrame(ALBUM_FID, self.strToUnicode(a));

def strToUnicode(self, s):
    t = type(s);
    if t != unicode and t == str:
        s = unicode(s, eyeD3.LOCAL_ENCODING);
    elif t != unicode and t != str:
        raise TagException("Wrong type passed to strToUnicode: %s" % str(t));
    return s;

But if I try to do tag.setAlbum(mp3album_v2.encode('cp1251').encode('utf-8')), then I am getting an error UnicodeDecodeError: 'utf8' codec can't decode byte 0xc6 in position 0: invalid continuation byte

LA_
  • 19,823
  • 58
  • 172
  • 308
  • The [source code shows the `setTextFrame()` method decorated](http://eyed3.nicfit.net/_modules/eyed3/id3/frames.html#FrameSet.setTextFrame) with a `@requireUnicode()` decorator; presumably that means that the encoding is handled *elsewhere*. – Martijn Pieters Apr 28 '14 at 15:20
  • Putting two `encode` calls in sequence does *not* make sense. `encode` goes from `unicode` to `bytes` while `decode` goes from `bytes` to `unicode`. The expression `x.encode(y).encode(z)` *never* makes sense because it goes from `unicode` to `bytes` to `bytes` again. On python3 you'd get an `AttributeError` because `bytes` does not have the `encode` method anymore. – Bakuriu Apr 28 '14 at 15:21
  • The [`Tag.save()` method](http://eyed3.nicfit.net/api/eyed3.id3.html#eyed3.id3.tag.Tag.save) has an `encoding` keyword argument; clearly the library expects Unicode throughout and only when *saving* does encoding take place. – Martijn Pieters Apr 28 '14 at 15:25
  • @MartijnPieters, I don't use `save()` method and save the data with `update()`, which doesn't have the encoding keyword. So, the main question is how to convert `utf-8` value to `cp1251` and then to `unicode`? – LA_ Apr 28 '14 at 15:30
  • The API expects `unicode`, which means *not* UTF-8 or other encodings. You'd have to find a way to set a different encoding to be used when the tags are saved again. – Martijn Pieters Apr 28 '14 at 15:33

1 Answers1

7

ID3v1 cannot reliably include any non-ASCII character. You can write cp1251-encoded bytes into ID3v1 tags but they will only render as Cyrillic on Russian-locale OS installs and even then not on all applications.

EyeD3 deals with Unicode strings internally and arbitrarily chooses to use latin1 (aka ISO-8859-1) as the encoding for ID3v1 tags. This probably isn't a good choice because latin1 is never the default locale-specific encoding on a Windows box (for Western Europe it's actually cp1252 which is similar but not the same).

However a property of this choice of encoding is that each byte in it maps to a Unicode character with the same code point number. You can take advantage of this by making a Unicode string that contains characters that, when encoded as latin1, will end up being the byte encoding of a chosen string in an encoding other than latin1.

album_name = u'Жить в твоей голове'
mangled_name = album_name.encode('cp1251').decode('latin1')
tag.setAlbum(mangled_name) # will encode as latin1, resulting in cp1251 bytes

This is a horrible hack, of doubtful benefit, and one of the reasons you should avoid ID3v1.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • 1
    Thanks a lot! Yes, this is the horrible hack, but this is exactly what I was looking for. – LA_ Apr 29 '14 at 06:20