9

I consider myself an experienced numpy user, but im not able to find a solution for the following problem. Assume there are the following arrays:

# sorted array of times
t = numpy.cumsum(numpy.random.random(size = 100))
#  some values associated with the times
x = numpy.random.random(size=100)
# some indices into the time/data array
indices = numpy.cumsum(numpy.random.randint(low = 1, high=10,size = 20)) 
indices = indices[indices <90] # respect size of 100
if len(indices) % 2: # make number of indices even
    indices = indices[:-1]

# select some starting and end indices
istart = indices[0::2]
iend   = indices[1::2]

What I now want is to reduce the value array x given the intervals denoted by istart and iend. I.e.

# e.g. use max reduce, I'll probably also need mean and stdv
what_i_want = numpy.array([numpy.max(x[is:ie]) for is,ie in zip(istart,iend)])

I have already googled a lot but all I could find was blockwise operations via stride_tricks which only allows for regular blocks. I was not able to find a solution without performing a pyhthon loop :-( In my real application arrays are much larger and performance does matter, so i use numba.jit for the moment.

Is there any numpy function I'm missing which is able to do that?

Marti Nito
  • 697
  • 5
  • 17

3 Answers3

7

Have you looked at ufunc.reduceat? With np.maximum, you can do something like:

>>> np.maximum.reduceat(x, indices)

which yields the maximum values along the slices x[indices[i]:indices[i+1]]. To get what you want (x[indices[2i]:indices[2i+1]), you could do

>>> np.maximum.reduceat(x, indices)[::2]

if you don't mind the extra computations of x[inidices[2i-1]:indices[2i]]. This yields the following:

>>> numpy.array([numpy.max(x[ib:ie]) for ib,ie in zip(istart,iend)])
array([ 0.60265618,  0.97866485,  0.78869449,  0.79371198,  0.15463711,
        0.72413702,  0.97669218,  0.86605981])

>>> np.maximum.reduceat(x, indices)[::2]
array([ 0.60265618,  0.97866485,  0.78869449,  0.79371198,  0.15463711,
        0.72413702,  0.97669218,  0.86605981])
wflynny
  • 18,065
  • 5
  • 46
  • 67
  • Awesome, thats exactly what i was looking for. I can just keep all indices in one array, then i dont do any overhead computations :) Maybe i should improve my googling skills... – Marti Nito Nov 21 '16 at 16:48
0

you can use numpy.r_
like this:

what_i_want = np.array([np.max(x[np.r_[ib:ie]]) for ib,ie in zip(istart,iend)])
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

(A non-numpy solution, using astropy)

import numpy as np
from astropy.nddata.utils import block_reduce
data = np.arange(16).reshape(4, 4)
block_reduce(data, 2)    

will convert:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

to:

array([[10, 18],
       [42, 50]])

See this for more examples.

Azim
  • 1,596
  • 18
  • 34