8

I have to read and write binary data, where each element of data:

  • size = 2 bytes (16 bit)
  • encoding = signed 2's complement
  • endiannes = big or little (must be selectable)

Is it possible without using any external module? If yes,

  1. How to read such data from a binary file using read() into an array L of integers?
  2. How to write array of integers L into a binary file using write()?
psihodelia
  • 29,566
  • 35
  • 108
  • 157
  • 3
    Have you looked at Python's struct module? – Kai Feb 17 '11 at 15:30
  • 2
    I'd say the struct module would be the best place to start – Exelian Feb 17 '11 at 15:31
  • Using `struct` would be quite inefficient, though, because you would have to unpack the values one by one. – Sven Marnach Feb 17 '11 at 15:34
  • 1
    @Sven Marnach: Have you measured that? – S.Lott Feb 17 '11 at 15:47
  • @S.Lott: Yes, while answering [this question](http://stackoverflow.com/questions/4227990/fast-way-to-read-interleaved-data) last year. I don't remember the exact figures. – Sven Marnach Feb 17 '11 at 15:52
  • @Sven Marnach: """unpack the values one by one""" ? Consider `struct.unpack(byteorder + str(len(rawbytes) // 2) + "h", rawbytes)` where `byteorder` is `<` or `>` as desired. Note: I'm not claiming that this is faster than the `array` way, but I do note that the `array` way sometimes needs an additional `byteswap` step. – John Machin Feb 17 '11 at 20:49
  • @John: You are perfectly right, I did not remember you can use a repeat count. The measurements I have done for the linked question do not apply to this case. – Sven Marnach Feb 17 '11 at 21:03

5 Answers5

12

I think you are best off using the array module. It stores data in system byte order by default, but you can use array.byteswap() to convert between byte orders, and you can use sys.byteorder to query the system byte order. Example:

# Create an array of 16-bit signed integers
a = array.array("h", range(10))
# Write to file in big endian order
if sys.byteorder == "little":
    a.byteswap()
with open("data", "wb") as f:
    a.tofile(f)
# Read from file again
b = array.array("h")
with open("data", "rb") as f:
    b.fromfile(f, 10)
if sys.byteorder == "little":
    b.byteswap()
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
2
from array import array
# Edit:
from sys import byteorder as system_endian # thanks, Sven!
# Sigh...
from os import stat

def read_file(filename, endian):
    count = stat(filename).st_size / 2
    with file(filename, 'rb') as f:
        result = array('h')
        result.fromfile(f, count)
        if endian != system_endian: result.byteswap()
        return result
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
1

I found this useful for reading/writing the data from a binary file into a numpy array:

import numpy as np

sys.argv[1] = endian # Pass endian as an argument to the program
if endian == 'big':
    precTypecode = '>'
elif endian == 'little':
    precTypecode = '<'

# Below: 'i' is for signed integer and '2' is for size of bytes. 
# Alternatively you can make this an if else statement to choose precision
precTypecode += 'i2'

im = np.fromfile(inputFilename, dtype = precTypecode) # im is now a numpy array
# Perform any operations you desire on 'im', for example switching byteorder
im.byteswap(True)
# Then write to binary file (note: there are some limitations, so refer doc)
im.tofile(outputFilename)

Hope this helps.

1

As asked, without any external modules:

with open("path/file.bin", "rb") as file:
    byte_content = file.read()
    list_16bits = [byte_content[i + 1] << 8 | byte_content[i] for i in range(0, len(byte_content), 2)]

In the comprehension list, we read each two bytes. Then, with bitwise operation we concatenate those 2 bytes. It depends of the endianess for where to write i+1 and i

Guillaume Lebreton
  • 2,586
  • 16
  • 25
1

Consider using

struct.unpack(byteorder + str(len(rawbytes) // 2) + "h", rawbytes)

where byteorder is '<' or '>' as desired, and similarly for packing. Note: I'm not claiming that this is faster than the array way, but I do note that the array way sometimes needs an additional byteswap step.

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • 1
    The `struct` way *always* needs an additional `unpack()` step. The main difference is that you will end up with a Python list, while you get an array when using `array.fromfile()`. – Sven Marnach Feb 20 '11 at 11:12