6

I read that question about how to use bisect on a list of tuples, and I used that information to answer that question. It works, but I'd like a more generic solution.

Since bisect doesn't allow to specify a key function, if I have this:

import bisect
test_array = [(1,2),(3,4),(5,6),(5,7000),(7,8),(9,10)]

and I want to find the first item where x > 5 for those (x,y) tuples (not considering y at all, I'm currently doing this:

bisect.bisect_left(test_array,(5,10000))

and I get the correct result because I know that no y is greater than 10000, so bisect points me to the index of (7,8). Had I put 1000 instead, it would have been wrong.

For integers, I could do

bisect.bisect_left(test_array,(5+1,))

but in the general case when there may be floats, how to to that without knowing the max values of the 2nd element?

test_array = [(1,2),(3,4),(5.2,6),(5.2,7000),(5.3,8),(9,10)]

I have tried this:

bisect.bisect_left(test_array,(min_value+sys.float_info.epsilon,))

and it didn't work, but I have tried this:

bisect.bisect_left(test_array,(min_value+sys.float_info.epsilon*3,))

and it worked. But it feels like a bad hack. Any clean solutions?

Community
  • 1
  • 1
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • There is this `SortedCollection` [code recipe](https://code.activestate.com/recipes/577197-sortedcollection/) that is recommended in the [bisect docs](https://docs.python.org/2/library/bisect.html) for using bisect with a key function. – user2390182 Feb 09 '17 at 20:55
  • yes; I could copy the bisect code and change the comparison function all right (not very convenient when you want to create a snappy answer) – Jean-François Fabre Feb 09 '17 at 21:05
  • @schwobaseggl nice find. I don't know how you can turn that to an answer without it being link-dependent. I'd upvote & accept if you found a way. When will they integrate such great recipies in libs or in the language itself ? – Jean-François Fabre Feb 09 '17 at 21:10

4 Answers4

10

As of Python 3.10, bisect finally supports key! So if you're on 3.10 or up, just use key. But if you're not...

bisect supports arbitrary sequences. If you need to use bisect with a key, instead of passing the key to bisect, you can build it into the sequence:

class KeyList(object):
    # bisect doesn't accept a key function before 3.10,
    # so we build the key into our sequence.
    def __init__(self, l, key):
        self.l = l
        self.key = key
    def __len__(self):
        return len(self.l)
    def __getitem__(self, index):
        return self.key(self.l[index])

Then you can use bisect with a KeyList, with O(log n) performance and no need to copy the bisect source or write your own binary search:

bisect.bisect_right(KeyList(test_array, key=lambda x: x[0]), 5)
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I'm accepting that one, just because copying the source of bisect loses the advantage of using the compiled version, avaiable on popular platforms. – Jean-François Fabre Oct 10 '17 at 19:12
4

This is a (quick'n'dirty) bisect_left implementation that allows an arbitrary key function:

def bisect(lst, value, key=None):
    if key is None:
        key = lambda x: x
    def bis(lo, hi=len(lst)):
        while lo < hi:
            mid = (lo + hi) // 2
            if key(lst[mid]) < value:
                lo = mid + 1
            else:
                hi = mid
        return lo
    return bis(0)

> from _operator import itemgetter
> test_array = [(1, 2), (3, 4), (4, 3), (5.2, 6), (5.2, 7000), (5.3, 8), (9, 10)]
> print(bisect(test_array, 5, key=itemgetter(0)))
3

This keeps the O(log_N) performance up since it does not assemble a new list of keys. The implementation of binary search is widely available, but this was taken straight from the bisect_left source. It should also be noted that the list needs to be sorted with regard to the same key function.

user2390182
  • 72,016
  • 6
  • 67
  • 89
  • this should be implemented in bisect_left perhaps as an option...have you considered it? – jimh Sep 28 '17 at 00:00
2

For this:

...want to find the first item where x > 5 for those (x,y) tuples (not considering y at all)

Something like:

import bisect
test_array = [(1,2),(3,4),(5,6),(5,7000),(7,8),(9,10)]

first_elem = [elem[0] for elem in test_array]
print(bisect.bisect_right(first_elem, 5))

The bisect_right function will take the first index past, and since you're just concerned with the first element of the tuple, this part seems straight forward. ...still not generalising to a specific key function I realize.

As @Jean-FrançoisFabre pointed out, we're already processing the entire array, so using bisect may not even be very helpful.

Not sure if it's any quicker, but we could alternatively use something like itertools (yes, this is a bit ugly):

import itertools
test_array = [(1,2),(3,4),(5,6),(5,7000),(7,8),(9,10)]

print(itertools.ifilter(
    lambda tp: tp[1][0]>5, 
    ((ix, num) for ix, num in enumerate(test_array))).next()[0]
)
Gerrat
  • 28,863
  • 9
  • 73
  • 101
  • so it requires you to create an aux list, but if there are a lot of `bisect` to be done, this could even be faster because you're not even looking at the second element. not bad at all. – Jean-François Fabre Feb 09 '17 at 21:08
  • @Jean-FrançoisFabre: Yes, a bit of a trade-off unfortunately (no free lunches here). – Gerrat Feb 09 '17 at 21:10
  • since you're running through the list, you could also compute the max of the 2nd element (would avoid creating another list). – Jean-François Fabre Feb 09 '17 at 21:16
  • @Jean-FrançoisFabre: Yes...I suppose once one needs to run through the list, just using an actual loop may be better than using bisect at all – Gerrat Feb 09 '17 at 21:22
  • true if you have only 1 insertion to perform, but not true if you have a lot of insertions to do (in which case you would have to insert in your aux list / compare 2nd value of inserted tuple against max). Not trivial. – Jean-François Fabre Feb 09 '17 at 21:24
2

As an addition to the nice suggestions, I'd like to add my own answer which works with floats (as I just figured it out)

bisect.bisect_left(test_array,(min_value+abs(min_value)*sys.float_info.epsilon),))

would work (whether min_value is positive or not). epsilon multiplied by min_value is guaranteed to be meaningful when added to min_value (it is not absorbed/cancelled). So it's the closest greater value to min_value and bisect will work with that.

If you have only integers that will still be faster & clearer:

bisect.bisect_left(test_array,(min_value+1,))
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219