python how to convert bytes to binary

Question

I'm trying to read a file's contents and convert them into what is actually stored in memory if I write

file = open("filename","br")
binary = "0b"
for i in file.read():
    binary += bin(i)[2:]

will binary equal the actual value stored in memory? if so, how can I convert this back into a string?

EDIT: I tried

file = open("filename.txt","br")
binary = ""
for i in file.read():
    binary += bin(i)[2:]
stored = ""
for bit in binary:
    stored += bit
    if len(stored) == 7:
        print(chr(eval("0b"+stored)), end="")
        stored = ""

and it worked fine until it reached a space and then it became weird signs and mixed-up letters.

It's not really clear what you're trying to do. `file.read() ` is literally the bytes that are in the file. Could you give an example of what you think is in the file and what you want the result to look like? — Frank Yellin, Sep 12 '20 at 21:22
I'm trying to do this for any text file in general. also, I want the result to be what's in the file to prove to myself that I actually have the binary version for various purposes — forever, Sep 12 '20 at 21:24
Also, you may not know that when you loop through a set of bytes, it returns the number representing those bytes, like `ord` does. — forever, Sep 12 '20 at 21:30

Mike67 · Accepted Answer · 2020-09-12T21:30:02.303

2

To get a (somewhat) accurate representation of the string as it is stored in memory, you need to convert each character into binary.

Assuming basic ascii (1 byte per character) encoding:

s = "python"
binlst = [bin(ord(c))[2:].rjust(8,'0') for c in s]  # remove '0b' from string, fill 8 bits
binstr = ''.join(binlst)

print(s)
print(binlst)
print(binstr)

Output

python
['01110000', '01111001', '01110100', '01101000', '01101111', '01101110']
011100000111100101110100011010000110111101101110

For unicode (utf-8), the length of each character can be 1-4 bytes so it's difficult to determine the exact binary representation. As @Yellen mentioned, it may be easier to just convert the file bytes to binary.

edited Sep 12 '20 at 21:30

answered Sep 12 '20 at 21:24

Mike67

11,175
2
7
15

I found an interesting article describing how to determine how many bytes UTF-8 encoded characters need to be read: https://www.johndcook.com/blog/2019/09/09/how-utf-8-works/ – luthervespers Sep 13 '20 at 00:06
@Mike67 so the problem was that `bin` deletes trailing zeros so you need to add them back? – forever Sep 13 '20 at 18:07
It deletes leading zeroes, so 00001101 becomes 1101. Need to add back zeros to fill 8 bits. – Mike67 Sep 13 '20 at 18:30

python how to convert bytes to binary

1 Answers1