Replace a loop in python with the equivalent of a matlab find

Question

Assume I have a sorted array of tuples which is sorted by the first value. I want to find the first index where a condition on the first element of the tuple holds. i.e. How do I replace the following code

test_array = [(1,2),(3,4),(5,6),(7,8),)(9,10)]
min_value = 5
index = 0
for c in test_array:
        if c[0] > min_value:
           break
        else:
            index = index + 1

With the equivalent of a matlab find ?

i.e. At the end of this loop I expect to get 3 but I'd like to make this more efficient. I an fine with using numpy for this. I tried using argmax but to no avail.

Thanks

Don't you mean you want to find the _last_ index where the condition holds, rather than the first? Because that's what you're doing here. Can you add a brief example of how you would do this in matlab so we can better understand what you're asking? — pretzlstyle, Feb 09 '17 at 20:12

Jean-François Fabre · Accepted Answer · 2017-02-10T13:16:48.137

4

Since the list is sorted and if you know the max possible value for the second element (or if there can only be 1 element with the same first value), you could apply bisect on the list of tuples (returns the sorted insertion position in the list)

import bisect
test_array = [(1,2),(3,4),(5,6),(7,8),(9,10)]
min_value = 5

print(bisect.bisect_left(test_array,(min_value,10000)))

Hardcoding to 10000 is bad, so if you only have integers you can do that instead:

print(bisect.bisect_left(test_array,(min_value+1,)))

result: 3

if you had floats (also works with integers) you could use sys.float_info.epsilon like this:

print(bisect.bisect_left(test_array,(min_value*(1+sys.float_info.epsilon),)))

It has O(log(n)) complexity so it's much better than a simple for loop when there are a lot of elements.

edited Feb 10 '17 at 13:16

answered Feb 09 '17 at 20:19

Jean-François Fabre

137,073
23
153
219

I did not know about bisect. Excellent! Thanks. – LostInTheFrequencyDomain Feb 09 '17 at 20:26
yes, that's a good one. See my edit. Depending on the data, my previous answer could fail. – Jean-François Fabre Feb 09 '17 at 20:27
It appears you can use None as the second parameter instead of 1000 as well. – LostInTheFrequencyDomain Feb 09 '17 at 20:45
not in python 3 you can't `unorderable types: NoneType() < int()`. But without element it works (but it's minimum value, not maximum): `>>> (5,) < (5,-30000) yields True` – Jean-François Fabre Feb 09 '17 at 20:47

Yuval Atzmon · Answer 2 · 2017-02-10T14:06:44.820

0

You can use numpy to indicate the elements that obey the conditions and then use argmax(), to get the index of the first one

import numpy
test_array = numpy.array([(1,2),(3,4),(5,6),(7,8),(9,10)])
min_value = 5

print (test_array[:,0]>min_value).argmax()

if you would like to find all of the elements that obey the condition, use can replace argmax() by nonzero()[0]

edited Feb 10 '17 at 14:06

answered Feb 09 '17 at 20:16

Yuval Atzmon

5,645
3
41
74

I'd say this is overkill – pretzlstyle Feb 09 '17 at 20:20
I would like the index where the condition holds. Further, this returned (3,4). I want to return the index of (7,8). Thanks – LostInTheFrequencyDomain Feb 09 '17 at 20:23
@jphollowed, what do you mean by overkill? That's a simple one line solution. – Yuval Atzmon Feb 09 '17 at 20:46
@user2476373 Just because numpy isn't necessary for something you can do trivially with python builtins. But more so because `nonzero()` is doing more than is needed. I don't consider it clean coding to immediately index the return of a function because you only even need part of it. Might as well use a more appropriate tool – pretzlstyle Feb 09 '17 at 21:28
@jphollowed, Thanks for the feedback. I edited the answer and replaced nonzero with argmax, and now it makes the solution cleaner. Re libraries, I disagree with you. I don't see a difference between `import bisect` as the accepted answer or `import numpy` as in this answer. I think numpy is preferable since it makes the code more concise, readble and in many times, faster – Yuval Atzmon Feb 10 '17 at 14:04

score 0 · Answer 3 · answered Feb 09 '17 at 20:23

In general, numpy's where is used in a fashion similar to MATLAB's find. However, from an efficiency standpoint, I where cannot be controlled to return only the first element found. So, from a computational perspective, what you're doing here is not arguably less inefficient.

The where equivalent would be

index = numpy.where(numpy.array([t[0] for t in test_array]) >= min_value)
index = index[0] - 1

Replace a loop in python with the equivalent of a matlab find

3 Answers3

Linked