2

I got one 2-D array

data = [[1,2,3...], [4,5,6...], [7,8,9...], ...]

and another 1-D array which contains the MINIMUM value of each sub-array from above:

minima= [1, 4, 7, ....]. So consequently len(minima) = len(data).

Now i want to set a threshold value, say threshold = 7 and want to delete each sub-array of data with a minimum below this threshold. So I tried the following:

threshold = 7
for i in range(len(minima)):
    if minima[i] < threshold:
        data = np.delete(data, i, 1)

but this gives me an IndexError: IndexError: index 225 is out of bounds for axis 1 with size 225

I guess it has sth to do with the axis and a loop is not the best approach but my expertise is very limited. Appreciate your help!

Jailbone
  • 167
  • 1
  • 9
  • 1
    An (overly?) detailed explanation as to why 'delete` (even in a list) is dangerous in a loop: https://stackoverflow.com/questions/61013951/how-to-use-delete-to-delete-specific-array-in-numpy-when-iterate-it/61015621#61015621 – hpaulj Apr 04 '20 at 18:39

1 Answers1

2

The reason this will fail is because if you delete a row, then the number of rows declreases, but i will keep incrementing until the "old" number of rows.

But anyway, using a loop to each time delete one row is not very efficient. Numpy can boost efficiency by doing operations in bulk. By iterating, your algorithm will likely not be much faster than a simply Python program without using numpy.

You can just filter the array, with:

data2 = data[minima >= threshold]

Note that you do not need to calculate the row-wise minimum yourself, you can calculate this with:

data2 = data[data.min(axis=1) >= threshold]
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555