2

I have following problem:

I want to read from file into a raw binary string :

The file looks like this (with escape characters, not binary data):

\xfc\xe8\x82\x00\x00\x00\x60\x89\xe5\x31\xc0\x64\x8b\x50\x30\x8b\x52

code used:

data = open("filename", "rb").read()

result obtained:

b"\\xfc\\xe8\\x82\\x00\\x00\\x00\\x60\\x89\\xe5\\x31\\xc0\\x64\\x8b\\x50\\x30\\x8b\\x52"

With dobule \ .

How can I read it as binary string like : \xaa characters ? (Without escape characters)

jfs
  • 399,953
  • 195
  • 994
  • 1,670
Symonen
  • 628
  • 2
  • 7
  • 19
  • Is your file the binary represented by what you're saying it looks like, or is it ascii that is *actually* what you're showing us, and you want to interpret it? – cge May 01 '16 at 18:28
  • The string is being read correctly, backslashes are represented in python strings using '\\' escape sequence. When you print the unformatted string (with b"" prefix), the escape sequence characters are shown. Add more details to the question on how you are planning to use the string for more helpful answers – Nitin Labhishetty May 01 '16 at 18:33
  • it is a simple txt file i should write that sorry. – Symonen May 01 '16 at 18:34

3 Answers3

2

Ok. Your problem here is that you're asking the wrong question. Your data file isn't a raw binary string, it's an encoded one, encoded with escape characters. You're reading it as a raw binary, though, when you need instead to decode the escapes. Try

data = open("filename", "r", encoding='unicode_escape').read().encode('raw_unicode_escape')

instead.

Edit: ok, this now works. You need to encode into raw_unicode_escape, not utf-8 (the default).

cge
  • 9,552
  • 3
  • 32
  • 51
0

This output is OK.

Python is outputting this data with double backslashes to show that it is non-printable. However, it's stored correctly, as bytes.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
0

To convert 4 ascii characters (\ x f c) from file into a single byte (252==0xfc), you could read ascii characters as bytes (data = open("filename", "rb").read()), remove \x prefix and convert the resulting hexadecimal bytestring into bytes containing corresponding raw binary data:

>>> import binascii
>>> data = b'\\xfc\\xe8\\x82'
>>> binascii.unhexlify(data.replace(b'\\x', b''))
b'\xfc\xe8\x82'

It is best to avoid storing data as b'\\xfc' (4 bytes) instead of b'\xfc' (1 byte) in the first place.

jfs
  • 399,953
  • 195
  • 994
  • 1,670