1

Let's consider a numpy array

a = array([1,2,25,13,10,9,4,5])

containing an even number of elements. I need to keep only one element of the array every two at random: either the first or the second, then either the third or the fourth, etc. For example, using a, it should result into:

c =  array([1,13,9,5])
d = array([2,13,10,4])
e = array([2,25,10,5])

I have to do that on long array of hundred elements and on thousand of arrays along huge loops. What would be the fastest algorithm that iterating over element and keeping or deleting one on two using pair_index+random.randint(0,1) A generalised method that keeps one element every three, four, etc. would be nice ;-) Thanks

results:

import timeit
import numpy

def soluce1():
    k=2
    a = numpy.array([1,2,25,13,10,9,4,5])
    aa = a.reshape(-1, k)
    i = numpy.random.randint(k, size = aa.shape[0])
    return numpy.choose(i, aa.T)
def soluce2():
    k=2
    a = numpy.array([1,2,25,13,10,9,4,5])
    w = len(a) // k
    i = numpy.random.randint(0, 2, w) + numpy.arange(0, 2 * w, 2) 
    return a[i]
def random_skip():
    a= numpy.array([1,2,25,13,10,9,4,5])
    k=2
    idx = numpy.arange(0, len(a), k)
    idx += numpy.random.randint(k, size=len(idx))
    idx = numpy.clip(idx, 0, len(a)-1)
    return a[idx]


>     ts1=timeit.timeit(stmt='soluce1()',setup='from __main__ import soluce1',number=10000)
> --> 161 µs
>     ts2=timeit.timeit(stmt='soluce2()',setup='from __main__ import soluce2',number=10000)
> --> 159 µs
>     ts3=timeit.timeit(stmt='random_skip()',setup='from __main__ import random_skip',number=10000)
> --> 166 µs

Seem to be equivalent proposals. Thanks again all.

sol
  • 1,389
  • 3
  • 19
  • 32

3 Answers3

4

You can select the elements using fancy indexing, a[idx]:

def random_skip(a, skipsize=2):
    idx = np.arange(0, len(a), skipsize)
    idx += np.random.randint(skipsize, size=len(idx))
    idx = np.clip(idx, 0, len(a)-1)
    return a[idx]    


In [141]: a = array([1,2,25,13,10,9,4,5])        
In [142]: random_skip(a)
Out[142]: array([ 1, 13,  9,  4])

In [143]: random_skip(a, skipsize=3)
Out[143]: array([1, 9, 5])

In [144]: random_skip(a, skipsize=4)
Out[144]: array([ 1, 10])

idx = np.arange(0, len(a), skipsize) selects the first item in each group.

idx += np.random.randint(skipsize, size=len(idx)) randomizes the index to some item in each group.

idx = np.clip(idx, 0, len(a)-1) protects the index from going out of bounds in case the skipsize is not a multiple of the length of a.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
3

np.choose is useful for choosing elements from a group of arrays, or a 2dim array (and fast!). So you can reshape your array to be Mx2, and slice using np.choose:

a = array([1,2,25,13,10,9,4,5])
k = 2

aa = a.reshape(-1, k)  # 1dim -> 2dim
i = np.random.randint(k, size = aa.shape[0]) # random indices
np.choose(i, aa.T)
=> array([ 1, 13,  9,  4])
shx2
  • 61,779
  • 13
  • 130
  • 153
1

one solution can be:

>>> xs
array([ 1,  2, 25, 13, 10,  9,  4,  5])
>>> k = len(xs) // 2
>>> i = np.random.randint(0, 2, k) + np.arange(0, 2 * k, 2) 
>>> xs[i]
array([ 1, 13, 10,  5]

generalizes to other step-sizes as long as array length is a multiple of the step-size; say for step size of 4:

>>> k = len(xs) // 4
>>> i = np.random.randint(0, 4, k) + np.arange(0, 4 * k, 4)
>>> xs[i]
array([1, 5])

alternatively:

>>> np.apply_along_axis(np.random.choice, 1, xs.reshape(len(xs)/2, 2))
array([ 2, 13, 10,  4])
>>> np.apply_along_axis(np.random.choice, 1, xs.reshape(len(xs)/4, 4))
array([ 1, 10])
behzad.nouri
  • 74,723
  • 18
  • 126
  • 124