Python, create numpy recarray efficiently

Question

today I am using this code to create a numpy recarray. I am pretty sure it can be done more code efficient. But not exactly sure how to. The input is t and p. Each step says how many seconds and how much power. output is an recarray in seconds.

## New cycle
import numpy as np
t = np.array([30, 60,  60,  60, 120, 120, 150, 600])
p = np.array([0, 200, 300, 400, 350,  50, 400,   0])

time = np.arange(t.sum())
power = np.ones(len(time))

for i in range(len(t)):
    if i ==0:
        power[0:t[i]] = p[i]
    else:
        power[t.cumsum()[i-1] : t.cumsum()[i]] = p[i]
listTuples = [(time[i], power[i]) for i in range(len(time))]
inputArray = np.array(listTuples, dtype=[('time', '<f8'), ('PlossTotal', '<f8')])

score 0 · Accepted Answer · answered Apr 01 '22 at 08:47

I believe the easiest way could be to:

zip lists t and p
use python's list multiplication to create lists of the length you want for each power value (e.g. [1] * 5 is [1, 1, 1, 1, 1])
convert each list to numpy array
concat (stack) all arrays
if you need an array of tuples, you can get it using enumerate

Code:

import numpy as np
t = np.array([30, 60,  60,  60, 120, 120, 150, 600])
p = np.array([0, 200, 300, 400, 350,  50, 400,   0])

res = np.hstack([np.array([ep] * et) for ep, et in zip(p, t)])
res_tuples = = np.array(list(enumerate(res)), dtype=[('time', '<f8'), ('PlossTotal', '<f8')])

score 0 · Answer 2 · answered Apr 01 '22 at 18:55

Your time and power arrays are:

In [25]: time
Out[25]: array([   0,    1,    2, ..., 1197, 1198, 1199])
In [26]: power
Out[26]: array([0., 0., 0., ..., 0., 0., 0.])

time for creating them:

In [28]: %%timeit
    ...: for i in range(len(t)):
    ...:     if i ==0:
    ...:         power[0:t[i]] = p[i]
    ...:     else:
    ...:         power[t.cumsum()[i-1] : t.cumsum()[i]] = p[i]    
69.2 µs ± 85 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

This is iterative, so I suspect it can be sped up, but I won't get into the details now. Also the fact that power is all zeros is suspicious. The time to create the array from the list of tuples is substantially more.

In [29]: %%timeit
    ...: listTuples = [(time[i], power[i]) for i in range(len(time))]
    ...: inputArray = np.array(listTuples, dtype=[('time', '<f8'), ('PlossTotal'
    ...: , '<f8')])

668 µs ± 533 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

An alternative way of filling a structured array is to assign values by field. Since the number of fields is small (compared to the number of records) this is often faster.

In [32]: arr = np.zeros(time.shape, dtype=[('time', '<f8'), ('PlossTotal'
    ...:     ...: , '<f8')])
In [33]: arr['time'] = time
In [34]: arr['PlossTotal'] = power
In [35]: inputArray==arr
Out[35]: array([ True,  True,  True, ...,  True,  True,  True])

and the timing:

In [36]: %%timeit
    ...: arr = np.zeros(time.shape, dtype=[('time', '<f8'), ('PlossTotal'
    ...:     ...: , '<f8')])
    ...: arr['time'] = time
    ...: arr['PlossTotal'] = power
7.53 µs ± 8.59 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

With this time, it's now worth while trying to speed up power. But since the values are all 0 I can't meaningfully test alternatives.

===

In [38]: %%timeit
    ...: res = np.hstack([np.array([ep] * et) for ep, et in zip(p, t)])
    ...: res_tuples =np.array(list(enumerate(res)), dtype=[('time', '<f8'), ('Pl
    ...: ossTotal', '<f8')])
    ...: 
    ...: 
576 µs ± 474 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Python, create numpy recarray efficiently

2 Answers2