Python-numpy reading bytes and offset to signed int32

Question

I have an operation to apply in python to more than 10 millions values. My problem is to optimise the actual operation. I have 2 working methods, numpy and python vanilla.

Python vanilla operation:

1: My raw value is a 4 byte data: b'\x9a#\xe6\x00' = [154, 35, 230, 0] = [0x9A, 0x23, 0xE6, 0x00]
2: I take the last byte and put it in first: b'\x00\x9a#\xe6' = [0, 154, 35, 230] = [0x00, 0x9A, 0x23, 0xE6]
3: I tranform it into a int32 signed value: -433874432

File loading:

f = open(path_data, "rb")
while trame := f.read(4):

Data operation:

trame = b'\x9a#\xe6\x00'
trame_list = list(trame)  # [154, 35, 230, 0]
trame_list_swap = [trame_list[-1]] + trame_list[:-1]
trame_swap = bytes(trame_list_swap)
result = int.from_bytes(trame_swap, byteorder='little', signed=True)

Numpy operation

File loading:

datas_raw = numpy.fromfile(path_data, dtype="<i4")
# datas_raw = numpy.array([-1708923392, 1639068928, 2024603392, ...])  # len(datas_raw) = 12171264
for i, trame in enumerate(datas_raw):

Data operation:

trame = 15082394
tmp = list(trame.tobytes("C"))
tmp.insert(0, tmp.pop())
result = numpy.ndarray(1, "<i", bytes(tmp))[0]

It is doing the same processing than vanilla but slower here because of numpy.ndarray that is triggered 10 millions times...

Question

My question is the following:

I would like for numpy version to operate on all value the bitwise operation without for loop (that are very slow in python)... Any other solution for the issue is welcome (not closed XY problem...)

Kevin S · Accepted Answer · 2022-01-07T13:30:17.500

1

Here I use some random data in place of data read from file, which you can do using np.loadtxt. Ideally, you would read your bytes into a 1-d array with shape (4*n,) and then reshape to be (n,4).

import numpy as np
rng = np.random.default_rng(0)
data = rng.integers(-2**31,2**31,size=10000,dtype="i4")
data = data.view("u1").reshape((-1,4))
# Last column first, then other 3
data = data[:,[3,0,1,2]]
# Depending on platform might need to specify byteorder, e.g., "<i4" or ">i4"
ints = np.ascontiguousarray(data).view("i4")

This produces values like

array([[-1031643175],
       [  267112355],
       [ -640212606],
       ...,

This returns an array with shape (n,1) of signed integers.

edited Jan 07 '22 at 13:30

answered Jan 07 '22 at 13:02

Kevin S

2,595
16
22

Thank you for your answer, My variable `datas_raw` is `numpy.array([-1708923392, 1639068928, 2024603392, ...])` To get the list of list of 4 int8, I have to apply `list(trame.tobytes("C"))` but how to do it without for loop? Your answer is usefull but I did not match with inputs... – Vincent Bénet Jan 07 '22 at 13:19
I changed the answer to reflect your input data structure. All you need is a `view` to make it bytes, then the pivot, then the second view. – Kevin S Jan 07 '22 at 13:31
Wow! It seems it is exactly what i needed! I test it and if ok full upvote your answer! Thank you – Vincent Bénet Jan 07 '22 at 13:33
The process is now 80x faster, thank you! – Vincent Bénet Jan 07 '22 at 13:45

Python-numpy reading bytes and offset to signed int32

Python vanilla operation:

Numpy operation

Question

1 Answers1