3

I am working on a script where it will breakdown another python script into blocks and using pycrypto to encrypt the blocks (all of this i have successfully done so far), now i am storing the encrypted blocks to a file so that the decrypter can read it and execute each block. The final result of the encryption is a list of binary outputs (something like blocks=[b'\xa1\r\xa594\x92z\xf8\x16\xaa',b'xfbI\xfdqx|\xcd\xdb\x1b\xb3',etc...]).

When writing the output to a file, they all end up into one giant line, so that when reading the file, all the bytes come back in one giant line, instead of each item from the original list. I also tried converting the bytes into a string, and adding a '\n' at the end of each one, but the problem there is that I still need the bytes, and I can't figure out how to undo the string to get the original byte.

To summarize this, i am looking to either: write each binary item to a separate line in a file so i can easily read the data and use it in the decryption, or i could translate the data to a string and in the decrpytion undo the string to get back the original binary data.

Here is the code for writing to the file:

    new_file = open('C:/Python34/testfile.txt','wb')
    for byte_item in byte_list:
        # This or for the string i just replaced wb with w and
        # byte_item with ascii(byte_item) + '\n'
        new_file.write(byte_item)
    new_file.close()

and for reading the file:

    # Or 'r' instead of 'rb' if using string method
    byte_list = open('C:/Python34/testfile.txt','rb').readlines()
jfs
  • 399,953
  • 195
  • 994
  • 1,670

3 Answers3

3

A file is a stream of bytes without any implied structure. If you want to load a list of binary blobs then you should store some additional metadata to restore the structure e.g., you could use the netstring format:

#!/usr/bin/env python
blocks = [b'\xa1\r\xa594\x92z\xf8\x16\xaa', b'xfbI\xfdqx|\xcd\xdb\x1b\xb3']

# save blocks
with open('blocks.netstring', 'wb') as output_file:
    for blob in blocks:
        # [len]":"[string]","
        output_file.write(str(len(blob)).encode())
        output_file.write(b":")
        output_file.write(blob)
        output_file.write(b",")

Read them back:

#!/usr/bin/env python3
import re
from mmap import ACCESS_READ, mmap

blocks = []
match_size = re.compile(br'(\d+):').match
with open('blocks.netstring', 'rb') as file, \
     mmap(file.fileno(), 0, access=ACCESS_READ) as mm:
    position = 0
    for m in iter(lambda: match_size(mm, position), None):
        i, size = m.end(), int(m.group(1))
        blocks.append(mm[i:i + size])
        position = i + size + 1 # shift to the next netstring
print(blocks)

As an alternative, you could consider BSON format for your data or ascii armor format.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
0

I think what you're looking for is byte_list=open('C:/Python34/testfile.txt','rb').read()

If you know how many bytes each item is, you can use read(number_of_bytes) to process one item at a time.

read() will read the entire file, but then it is up to you to decode that entire list of bytes into their respective items.

Eric Y
  • 1,677
  • 1
  • 12
  • 17
0

In general, since you're using Python 3, you will be working with bytes objects (which are immutable) and/or bytearray objects (which are mutable).

Example:

b1 = bytearray('hello', 'utf-8')
print b1

b1 += bytearray(' goodbye', 'utf-8')
print b1

open('temp.bin', 'wb').write(b1)

#------

b2 = open('temp.bin', 'rb').read()
print b2

Output:

bytearray(b'hello')
bytearray(b'hello goodbye')
b'hello goodbye'
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • how exactly does this solve my problem? This will just gives me a giant byte, what i am looking for is on the receiving end to have a list of the bytes that went in (in your case, when i read/readline i would be able to easily derive `[b'hello',b'goodbye']` ) – zozer_firehood Aug 05 '15 at 22:51
  • *"a giant byte"* - Lol "byte" almost always means "octet" these days, exactly 8 bits. If you're actually dealing with binary data, it would be incredibly inefficient in Python to have a `list` of individual byte values, which is why I suggested using `bytes` and `bytearray` objects. You haven't explained what kind of data you're actually trying to store and recover, so it's difficult to give better advice - especially because you refer to both "bytes" (implying binary data) and strings of text. – Jonathon Reinhart Aug 05 '15 at 23:14
  • ill edit my question and hopefully this will help you understand what i am looking for: what i am trying to acomplish is: i have a python script, i am breaking down the script into blocks and using pycrypto to encrypt the blocks (all of this i have successfully done so far), now i am storing the encrypted blocks to a file so that the decrypter can read it and execute each block. The final result of the encryption is a list of binary outputs – zozer_firehood Aug 05 '15 at 23:25