25

I have a binary file which is simple a list of signed 32 bit ieee754 floating point numbers. They are not separated by anything, and simply appear one after another until EOF.

How would I read from this file and interpret them correctly as floating point numbers?

I tried using read(4), but it automatically converts them to a string with ascii encoding.

I also tried using bytearray but that only takes it in 1 byte at a time instead of 4 bytes at a time as I need.

Razor Storm
  • 12,167
  • 20
  • 88
  • 148

4 Answers4

34
struct.unpack('f', file.read(4))

You can also unpack several at once, which will be faster:

struct.unpack('f'*n, file.read(4*n))
Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
  • 1
    +1 for the 'f'*n; where is that syntax documented? I must have missed that in my Python primer. – Andrew White Jun 08 '11 at 22:37
  • 3
    String multiplication is documented in the tutorial and in the library reference section on sequence objects. – Thomas Wouters Jun 08 '11 at 22:38
  • @Andrew: There's a brief mention of this in the tutorial, under [Strings](http://docs.python.org/tutorial/introduction.html#strings). Search for "repeated". – Marcelo Cantos Jun 08 '11 at 22:40
  • 4
    The more general way of unpacking several would be `unpack('{0}f'.format(n), ...)`, or if you know how many in advance then just `unpack('10f', ...)` for example. Better to use the in-built repetition method than rely on string manipulation. – Scott Griffiths Jun 09 '11 at 07:54
  • 2
    @cdiggins: I tend to favour whatever requires the least amount of typing and is easiest to read. These two factors occasionally clash, so you may have trade one off against the other, but in this case my version is both shorter *and* clearer. Performance-wise, I expect the two forms to be almost identical, since the bulk of the time is spent in the I/O subsystem. If the length is known at coding time, then I agree that `'10f'` is better, for exactly the same reasons: it is slightly shorter and easier to read than `'f'*10`. – Marcelo Cantos Mar 10 '12 at 23:18
  • 5
    @Marcelo, I agree with the principle but consider unpacking 100,000 ints. It doesn't make sense to me to create a format string that is 100k long. Instead '{0}f'.format(1000000) makes more sense. – cdiggins Mar 11 '12 at 14:11
  • @cdiggins: What doesn't make sense? At 100000 elements, my version is 10% (12 µs) slower, and remains noticeably shorter and clearer. – Marcelo Cantos Mar 12 '12 at 03:08
5

Take a peek at struct.unpack. Something like the following might work...

f = struct.unpack('f', data_read)
Andrew White
  • 52,720
  • 19
  • 113
  • 137
3
import struct
(num,) = struct.unpack('f', f.read(4))
Chris Eberle
  • 47,994
  • 12
  • 82
  • 119
0

The fastest approach (in terms of performance) I found so far is numpy.fromfile

import numpy as np

class FloatReader:
    def __init__(self, filename):
        self.f = open(filename, "rb")
    
    def read_floats(self, count : int):
        return np.fromfile(self.f, dtype=np.float32, count=count, sep='')

This approach is much faster than struct.unpack in terms of performance!

Anatoly
  • 5,119
  • 1
  • 14
  • 8