Reading 32 bit signed ieee 754 floating points from a binary file with python?

Question

I have a binary file which is simple a list of signed 32 bit ieee754 floating point numbers. They are not separated by anything, and simply appear one after another until EOF.

How would I read from this file and interpret them correctly as floating point numbers?

I tried using read(4), but it automatically converts them to a string with ascii encoding.

I also tried using bytearray but that only takes it in 1 byte at a time instead of 4 bytes at a time as I need.

score 34 · Accepted Answer · answered Jun 08 '11 at 22:30

34

struct.unpack('f', file.read(4))

You can also unpack several at once, which will be faster:

struct.unpack('f'*n, file.read(4*n))

answered Jun 08 '11 at 22:30

Marcelo Cantos

181,030
38
327
365

1

+1 for the 'f'*n; where is that syntax documented? I must have missed that in my Python primer. – Andrew White Jun 08 '11 at 22:37
3

String multiplication is documented in the tutorial and in the library reference section on sequence objects. – Thomas Wouters Jun 08 '11 at 22:38
@Andrew: There's a brief mention of this in the tutorial, under [Strings](http://docs.python.org/tutorial/introduction.html#strings). Search for "repeated". – Marcelo Cantos Jun 08 '11 at 22:40
4

The more general way of unpacking several would be `unpack('{0}f'.format(n), ...)`, or if you know how many in advance then just `unpack('10f', ...)` for example. Better to use the in-built repetition method than rely on string manipulation. – Scott Griffiths Jun 09 '11 at 07:54
2

@cdiggins: I tend to favour whatever requires the least amount of typing and is easiest to read. These two factors occasionally clash, so you may have trade one off against the other, but in this case my version is both shorter *and* clearer. Performance-wise, I expect the two forms to be almost identical, since the bulk of the time is spent in the I/O subsystem. If the length is known at coding time, then I agree that `'10f'` is better, for exactly the same reasons: it is slightly shorter and easier to read than `'f'*10`. – Marcelo Cantos Mar 10 '12 at 23:18
5

@Marcelo, I agree with the principle but consider unpacking 100,000 ints. It doesn't make sense to me to create a format string that is 100k long. Instead '{0}f'.format(1000000) makes more sense. – cdiggins Mar 11 '12 at 14:11
@cdiggins: What doesn't make sense? At 100000 elements, my version is 10% (12 µs) slower, and remains noticeably shorter and clearer. – Marcelo Cantos Mar 12 '12 at 03:08

score 5 · Answer 2 · answered Jun 08 '11 at 22:30

5

Take a peek at struct.unpack. Something like the following might work...

f = struct.unpack('f', data_read)

answered Jun 08 '11 at 22:30

Andrew White

52,720
19
113
137

score 3 · Answer 3 · answered Jun 08 '11 at 22:32

3

import struct
(num,) = struct.unpack('f', f.read(4))

answered Jun 08 '11 at 22:32

Chris Eberle

47,994
12
82
119

score 0 · Answer 4 · answered Dec 15 '22 at 08:06

The fastest approach (in terms of performance) I found so far is numpy.fromfile

import numpy as np

class FloatReader:
    def __init__(self, filename):
        self.f = open(filename, "rb")
    
    def read_floats(self, count : int):
        return np.fromfile(self.f, dtype=np.float32, count=count, sep='')

This approach is much faster than struct.unpack in terms of performance!

Reading 32 bit signed ieee 754 floating points from a binary file with python?

4 Answers4

Linked