0

I have a csv with some data, and in one row there is a text that was added after encoding it in utf-8.

This is the text:

"b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'"

I'm trying to use this text to obtain the original characters using the decode function, but it's imposible.

Does anyone know which is the correct procedure to do it?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Madmartigan
  • 453
  • 1
  • 5
  • 14

2 Answers2

4

Assuming that the line in your file is exactly like this:

b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'

And reading the line from the file gives the output:

>>> line
"b'\\xe7\\x94\\xb3\\xe8\\xbf\\xaa\\xe8\\xa5\\xbf\\xe8\\xb7\\xaf255\\xe5\\xbc\\x84660\\xe5\\x8f\\xb7\\xe5\\x92\\x8c665\\xe5\\x8f\\xb7 \\xe4\\xb8\\xad\\xe5\\x9b\\xbd\\xe4\\xb8\\x8a\\xe6\\xb5\\xb7\\xe6\\xb5\\xa6\\xe4\\xb8\\x9c\\xe6\\x96\\xb0\\xe5\\x8c\\xba 201205'"`

You can try to use eval() function:

with open(r"your_csv.csv", "r") as csvfile:
    for line in csvfile:
        # when you reach the desired line
        b = eval(line).decode('utf-8')

Output:

>>> print(b)
'申迪西路255弄660号和665号 中国上海浦东新区 201205'
abybaddi009
  • 1,014
  • 9
  • 22
0

Try this:-

a = b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'
print(a.decode('utf-8')) #your decoded output

As you are saying you are reading from file then you can try with passing encoding system when reading:-

import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
    print repr(line)
Narendra
  • 1,511
  • 1
  • 10
  • 20
  • 1
    I know that works. My problem is that I can not find the way to prepare the string. When I read the row I obtain "b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\..." But I need b'\xe7\x94\xb3\xe8\xbf\xaa\xe8...' – Madmartigan Feb 21 '18 at 10:05
  • @Madmartigan ok in that case i modified my answer...try with it – Narendra Feb 21 '18 at 10:29
  • 1
    @Narendra OP is asking about python-3. It's enough to use `open(path, 'r', encoding='utf-8')`. You don't have to use the codecs module. – viraptor Feb 21 '18 at 10:41