I am getting an mp3 tag (ID V1) with eyeD3 and would like to understand its encoding. Here is what I try:
>>> print(type(mp3artist_v1))
<type 'unicode'>
>>> print(type(mp3artist_v1.encode('utf-8')))
<type 'str'>
>>> print(mp3artist_v1)
Zåìôèðà
>>> print(mp3artist_v1.encode('utf-8').decode('cp1252'))
ZåìôèðÃ
>>> print(u'Zемфира'.encode('utf-8').decode('cp1252'))
Zемфира
If I use an online tool to decode the value, it says that the value Zемфира
could be converted to correct value Zемфира
by changing encodings CP1252 → UTF-8
and value Zåìôèðà
by changing encodings like CP1252 → CP1251
.
What should I do to get Zемфира
from mp3artist_v1
? .encode('cp1252').decode('cp1251')
works well, but how can I understand possible encoding automatically (just 3 encodings are possible - cp1251
, cp1252
, utf-8
? I was planning to use the following code:
def forceDecode(string, codecs=['utf-8', 'cp1251', 'cp1252']):
for i in codecs:
try:
print(i)
return string.decode(i)
except:
pass
print "cannot decode url %s" % ([string])
but it does not help since I should encode with one charset first and then decode with another.