1

Assume I have a sorted array of tuples which is sorted by the first value. I want to find the first index where a condition on the first element of the tuple holds. i.e. How do I replace the following code

test_array = [(1,2),(3,4),(5,6),(7,8),)(9,10)]
min_value = 5
index = 0
for c in test_array:
        if c[0] > min_value:
           break
        else:
            index = index + 1

With the equivalent of a matlab find ?

i.e. At the end of this loop I expect to get 3 but I'd like to make this more efficient. I an fine with using numpy for this. I tried using argmax but to no avail.

Thanks

  • Don't you mean you want to find the _last_ index where the condition holds, rather than the first? Because that's what you're doing here. Can you add a brief example of how you would do this in matlab so we can better understand what you're asking? – pretzlstyle Feb 09 '17 at 20:12

3 Answers3

4

Since the list is sorted and if you know the max possible value for the second element (or if there can only be 1 element with the same first value), you could apply bisect on the list of tuples (returns the sorted insertion position in the list)

import bisect
test_array = [(1,2),(3,4),(5,6),(7,8),(9,10)]
min_value = 5

print(bisect.bisect_left(test_array,(min_value,10000)))

Hardcoding to 10000 is bad, so if you only have integers you can do that instead:

print(bisect.bisect_left(test_array,(min_value+1,)))

result: 3

if you had floats (also works with integers) you could use sys.float_info.epsilon like this:

print(bisect.bisect_left(test_array,(min_value*(1+sys.float_info.epsilon),)))

It has O(log(n)) complexity so it's much better than a simple for loop when there are a lot of elements.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
0

You can use numpy to indicate the elements that obey the conditions and then use argmax(), to get the index of the first one

import numpy
test_array = numpy.array([(1,2),(3,4),(5,6),(7,8),(9,10)])
min_value = 5

print (test_array[:,0]>min_value).argmax()

if you would like to find all of the elements that obey the condition, use can replace argmax() by nonzero()[0]

Yuval Atzmon
  • 5,645
  • 3
  • 41
  • 74
  • I'd say this is overkill – pretzlstyle Feb 09 '17 at 20:20
  • I would like the index where the condition holds. Further, this returned (3,4). I want to return the index of (7,8). Thanks – LostInTheFrequencyDomain Feb 09 '17 at 20:23
  • @jphollowed, what do you mean by overkill? That's a simple one line solution. – Yuval Atzmon Feb 09 '17 at 20:46
  • @user2476373 Just because numpy isn't necessary for something you can do trivially with python builtins. But more so because `nonzero()` is doing more than is needed. I don't consider it clean coding to immediately index the return of a function because you only even need part of it. Might as well use a more appropriate tool – pretzlstyle Feb 09 '17 at 21:28
  • @jphollowed, Thanks for the feedback. I edited the answer and replaced nonzero with argmax, and now it makes the solution cleaner. Re libraries, I disagree with you. I don't see a difference between `import bisect` as the accepted answer or `import numpy` as in this answer. I think numpy is preferable since it makes the code more concise, readble and in many times, faster – Yuval Atzmon Feb 10 '17 at 14:04
0

In general, numpy's where is used in a fashion similar to MATLAB's find. However, from an efficiency standpoint, I where cannot be controlled to return only the first element found. So, from a computational perspective, what you're doing here is not arguably less inefficient.

The where equivalent would be

index = numpy.where(numpy.array([t[0] for t in test_array]) >= min_value)
index = index[0] - 1
heyiamt
  • 201
  • 2
  • 6