4

A python library outputs me text-dumped binary utf-8 strings, like that:

In [1]: string
Out[1]: "b'\\xd0\\x9f\\xd1\\x80\\xd0\\xb5\\xd0\\xb4\\xd0\\xb8\\xd1\\x81\\xd0\\xbb\\xd0\\xbe\\xd0\\xb2\\xd0\\xb8\\xd0\\xb5'"

In [2]: type(string)
Out[2]: str

I need to recover real strings from those, as if they're real binary strings:

In [91]: string_b
Out[91]: b'\xd0\x9f\xd1\x80\xd0\xb5\xd0\xb4\xd0\xb8\xd1\x81\xd0\xbb\xd0\xbe\xd0\xb2\xd0\xb8\xd0\xb5'

In [92]: type(string_b)
Out[92]: bytes

In [93]: string_b.decode('UTF-8')
Out[93]: 'Предисловие'

How can I do that?

krvkir
  • 771
  • 7
  • 12

2 Answers2

3

Use ast.literaleval to read the string as if it where python code:

import ast
res = ast.literal_eval("b'\\xd0\\x9f\\xd1\\x80\\xd0\\xb5\\xd0\\xb4\\xd0\\xb8\\xd1\\x81\\xd0\\xbb\\xd0\\xbe\\xd0\\xb2\\xd0\\xb8\\xd0\\xb5'")
res
b'\xd0\x9f\xd1\x80\xd0\xb5\xd0\xb4\xd0\xb8\xd1\x81\xd0\xbb\xd0\xbe\xd0\xb2\xd0\xb8\xd0\xb5'
res.decode("UTF-8")
'Предисловие'
Netwave
  • 40,134
  • 6
  • 50
  • 93
1

If you have a string containing the repr of a binary string, you can go back using ast.literal_eval.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299