8

I am using python. I have two lists, list 1 is 7000 integers long, list 2 is 25000 integers. I want to go through each number in list 1 and find the closest number in list 2 that is bigger and the closest number that is smaller than each number in list 1, and then calculate the difference between these two numbers in list 2. So far I have:

for i in list1:
    for j in list 2:
        if list2[j]<list1[i]:
            a = max(list2)
        elif list2[j]>list1[i]:
            b = min(list2)
            interval = b-a

This doesn't seem to work. I want to find the explicit numbers in list 2 that are less than a specific number in list 1 and know the maximum, and then find out the smallest number in list 2 that is bigger than the number in list 1. Does anyone have any ideas? Thanks

ppwater
  • 2,315
  • 4
  • 15
  • 29

4 Answers4

6

You can use the bisect module, worst case complexity O(N * logN):

import bisect
lis1 = [4, 20, 26, 27, 30, 53, 57, 76, 89, 101]
lis2 = [17, 21, 40, 49, 53, 53, 53, 53, 70, 80, 81, 95, 99] #this must be sorted
#use lis2.sort() in case lis2 is not sorted
for x in lis1:
       #returns the index where x can be placed in lis2, keeping lis2 sorted
       ind=bisect.bisect(lis2,x) 
       if not (x >= lis2[-1] or x <= lis2[0]):
           sm, bi = lis2[ind-1], lis2[ind]

           if sm == x:  
               """ To handle the case when an item present in lis1 is 
               repeated multiple times in lis2, for eg 53 in this case"""
               ind -= 1
               while lis2[ind] == x:
                   ind -= 1
               sm = lis2[ind]

           print "{} <= {} <= {}".format(sm ,x, bi)

output:

17 <= 20 <= 21
21 <= 26 <= 40
21 <= 27 <= 40
21 <= 30 <= 40
49 <= 53 <= 70
53 <= 57 <= 70
70 <= 76 <= 80
81 <= 89 <= 95

Though this will not output anything for 4 and 101, as 4 is smaller than any element in lis2 and 101 is greater than any element in lis2. But that can be fixed if required.

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
6

Here's a vectorized solution using NumPy. It should be extremely fast, as it has no loops in Python (apart from the printing stage at the end).

import numpy as np

# set up fake data
l1 = np.array([1.9, 2, 2.1]) # or whatever list you have
l2 = np.array([1, 2, 5, 10]) # as above
l2.sort() # remove this line if it's always sorted

# the actual algorithm
indexes = np.searchsorted(l2, l1, side='right')
lower = l2[indexes - 1]
upper = l2[indexes]
diffs = upper - lower

# print results for debugging
for value, diff in zip(l1, diffs):
    print "value", value, "gap", diff

Here's the output with the hard-coded test data as above:

value 1.9 gap 1
value 2.0 gap 3
value 2.1 gap 3
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
4

First of all, your example is not valid code, or at least it doesn't do what you want it to do. If you have

for i in list1:

then i is not the index, but an element of list1. So first of all you would compare i and j, not list[i] and list[j].

It should be easier to use list comprehensions>

for i in list1:
    a = max([n for n in list2 if n < i])
    b = min([n for n in list2 if n > i])

You might have to add an if or two to make sure a and b exist, but it should work like this.

Vikas
  • 8,790
  • 4
  • 38
  • 48
Harpe
  • 316
  • 1
  • 9
  • 5
    You don't need the second loop `for j in list2:`, do you? – Vikas May 28 '13 at 12:14
  • +1 for list comprehensions, although generator expressions might be a bit easier to read: `a = max(n for n in list2 if n < i)`. I like how (either of) these read as being almost directly from the spec the OP gave: "a is the maximum of those elements in `list2` that are lower than this particular element of `list1`". Also, `a` and `b` will certainly exist here: if the argument to `max` or `min` amounts to an empty sequence (or, in Python 3, contain types that can't be mutually ordered), they will raise `ValueError`. – lvc May 28 '13 at 12:30
  • yes that's great! thank you so much. it only worked when i didn't include the second for loop, that's correct. – user2428358 May 28 '13 at 12:35
  • @wim `max` and `min` are both linear. The loop over `list1` makes it `O(len(list1) * len(list2))`. This isn't *wonderful*, but calling it obnoxious is a bit harsh. – lvc May 28 '13 at 12:50
  • yeah, the question has been edited - when I voted and commented it had another loop over list2 which did all of nothing, and made it O(n^3) but another user fixed it now. – wim May 28 '13 at 13:00
  • @wim if you're happy that the problem you downvoted for is fixed, perhaps undoing your vote would be in order? – lvc May 28 '13 at 13:11
  • You don't need to generate a list inside `max()` and `min()` as they will work on sequences. So the `[ ]` is not required. – Burhan Khalid May 29 '13 at 03:58
0

Here's a solution not using numpy, bisect module or list comprehensions! Enjoy

list1=[1,2,4,8,16,32,64]
list2=[3,6,9,12,15,18,21]

correct={4:3, 8:3, 16:3}

lower=0
for t in list1:
  print t
  difference = 0
  index = lower
  while (difference == 0 and index<len(list2)-1):
    print "consider %d < %d and %d > %d" % (list2[index],t,list2[index+1],t)
    if list2[index]<t and list2[index+1] > t:
          lower = index
          upper = index + 1
          difference = list2[upper] - list2[lower]                              
          print "%d difference %d" % (t,list2[upper] - list2[lower])
          break
    index = index +1

  if t in correct.keys():
       assert(difference == correct[t])
Vorsprung
  • 32,923
  • 5
  • 39
  • 63