4

I want to convert a Python float into a byte array, encoding it as a 32 bit little-endian IEEE floating point number, in order to write it to a binary file.

What is the modern Pythonic way to do that in Python 3? For ints I can do my_int.to_bytes(4,'little'), but there is no to_bytes method for floats.

It's even better if I can do this in one shot for every float in a numpy array (with dtype numpy.float32). But note that I need to get it as a byte array, not just write the array to a file immediately.

There are some similar-sounding questions, but they seem mostly to be about getting the hex digits, not writing to a binary file.

N. Virgo
  • 7,970
  • 11
  • 44
  • 65
  • https://docs.python.org/3.7/library/struct.html – freakish Nov 12 '19 at 07:16
  • The right tools for manipulating individual, native Python scalars are usually not the right tools for manipulating NumPy arrays. If you want a NumPy solution, I recommend specifically asking about NumPy and leaving regular Python types out (and expect to get non-NumPy answers anyway from people who don't know NumPy). – user2357112 Nov 12 '19 at 07:18
  • @user2357112 I'd be happy with a non-numpy answer, since I'm writing the floats one at a time. I mentioned numpy mostly because a numpy solution won't hurt (I'm importing it anyway) and might be useful to know in the future. – N. Virgo Nov 12 '19 at 07:22
  • 1
    You might want to try to find a way to avoid writing them one at a time. That'll be slow. – user2357112 Nov 12 '19 at 07:24
  • @user2357112 you're right. Luckily, the numpy solutions have enabled me to do that :) – N. Virgo Nov 12 '19 at 07:37

4 Answers4

5

NumPy arrays come with a tobytes method that gives you a dump of their raw data bytes:

arr.tobytes()

You can specify an order argument to use either C-order (row major) or F-order (column major) for multidimensional arrays.

Since you want to dump the bytes to a file, you may also be interested in the tofile method, which dumps the bytes to a file directly:

arr.tofile(your_file)

tofile always uses C-order.

If you need to change endianness, you can use the byteswap method. (newbyteorder has a more convenient signature, but doesn't change the underlying bytes, so it won't affect tobytes.)

import sys
if sys.byteorder=='big':
    arr = arr.byteswap()
data_bytes = arr.tobytes()
user2357112
  • 260,549
  • 28
  • 431
  • 505
3

You could use struct to pack the bytes like,

>>> import struct
>>> struct.pack('<f', 3.14) # little-endian
b'\xc3\xf5H@'
>>> struct.pack('>f', 3.14) # big-endian
b'@H\xf5\xc3'
han solo
  • 6,390
  • 1
  • 15
  • 19
1

With the right dtype you can write the array's data buffer to a bytestring or to a binary file:

In [449]: x = np.arange(4., dtype='<f4')                                        
In [450]: x                                                                     
Out[450]: array([0., 1., 2., 3.], dtype=float32)
In [451]: txt = x.tostring()                                                    
In [452]: txt                                                                   
Out[452]: b'\x00\x00\x00\x00\x00\x00\x80?\x00\x00\x00@\x00\x00@@'
In [453]: x.tofile('test')                                                                                                                           
In [455]: np.fromfile('test','<f4')                                             
Out[455]: array([0., 1., 2., 3.], dtype=float32)
In [459]: with open('test','br') as f: print(f.read())                          
b'\x00\x00\x00\x00\x00\x00\x80?\x00\x00\x00@\x00\x00@@'

Change endedness:

In [460]: x.astype('>f4').tostring()                                            
Out[460]: b'\x00\x00\x00\x00?\x80\x00\x00@\x00\x00\x00@@\x00\x00'
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

There are save/savez methods in numpy:

Store data to disk, and load it again:

>>> np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
>>> np.load('/tmp/123.npy')
array([[1, 2, 3],
       [4, 5, 6]])

Store compressed data to disk, and load it again:

>>> a=np.array([[1, 2, 3], [4, 5, 6]])
>>> b=np.array([1, 2])
>>> np.savez('/tmp/123.npz', a=a, b=b)
>>> data = np.load('/tmp/123.npz')
>>> data['a']
array([[1, 2, 3],
       [4, 5, 6]])
>>> data['b']
array([1, 2])
>>> data.close()
lenik
  • 23,228
  • 4
  • 34
  • 43
  • `save` differs from `tofile` in that it also saves the `shape` and `dtype` in an initial data block. – hpaulj Nov 12 '19 at 07:27
  • @hpaulj `shape` and `dtype` don't take much space, but play the crucial role in flawless reading and converting the data back instead of getting a bunch of binary garbage. – lenik Nov 12 '19 at 07:30
  • As a complete file format, `.npy` is definitely more informative and useful than just a raw byte dump. As an intermediate representation or a component of a larger file, a raw byte dump may be significantly more useful. – user2357112 Nov 12 '19 at 07:40