0

(Edited: now referencing Unbaking mojibake)

Source file: Android phone .vcf contacts file Destination: Windows 7 User Contacts file (imported .vcf)

Resulting contact info: Korean mojibake for the Name Field: '_곗퐫 李쏀__ㅻ━肄섏떎留_'

The result should be only Korean text. After doing a bit of research, I'm guessing there is some encoding problems related to EUC-kr because of the Chinese characters in the mojibake. But really have no idea...

Using code modified from Unbaking mojibake

this is the Python results: _°ì¤Ô¤½¤É¤§ ì°½í__¤ë¦¬ì½¤Ô¤µ¤Â¤°¤Ô¤¨¤Â¤«ë§_

Obviously it's not correct. So still stuck...

# ?'windows-1252' -> 949?
encoding1 = 'ISO-2022-KR' #could not encode
encoding1 = 'windows-1252' #could not encode
encoding1 = 'iso-8859-1' #could not encode
encoding1 = 'utf8' #does nothing
encoding1 = 'euc-kr'

import chardet

import chardet
import codecs

mojibake = '_곗퐫 李쏀__ㅻ━肄섏떎留_'

try:
    encoded_str = mojibake.encode(encoding1)
except UnicodeEncodeError:
    print("error: could not encode")
    encoded_str = None

if encoded_str:
    detected_encoding = chardet.detect(encoded_str)["encoding"]
    print('detected_encoding',detected_encoding)
    try:
        correct_str = encoded_str.decode(detected_encoding)
    except UnicodeEncodeError:
        print("could not decode encoded_str as", detected_encoding)

    print(correct_str)
hippo
  • 3
  • 3
  • Are you sure your text you are getting to begin with is actually wrong? – pvg Oct 08 '17 at 19:22
  • What would a correct version of your mojibake string look like? Only Korean, no Chinese characters? – lenz Oct 08 '17 at 19:42
  • The result should have no Chinese characters...The Korean letters currently showing in the mojibake are not correct. – hippo Oct 10 '17 at 15:12

0 Answers0