1

I have a sorted csv file with multiple columns and I want to return the value or the index of an item in column 1. This csv file has around 300.000 to 400.000 values so I'm trying to avoid any min function since it would propably take to long and I need the value in under a second.

So what I'm doing is adding up the entries of column1 into a list via:

with open('example.csv', 'r') as f:
     reader = csv.reader(f, delimiter=';')
     for row in reader:
         array.append(int(row[0])) 

And now comes the tricky part since I couldn't find a suitable function and/or example who was looking for a lower or equal value. I tried editing this example I found on stackoverflow.com

def find_closest(t):
 idx = bisect.bisect_left(array, t) # Find insertion point

# Check which timestamp with idx or idx - 1 is closer
 if idx > 0 and abs(array[idx] - value) > abs(array[idx - 1] - value):
     idx -= 1

 return array[idx]

This example is giving out the closest value, lower, equal or greater. But I couldn't manage to change it the way I want to.

As an example with numbers what I'm looking for is:

array=[123,123,123,124,125,125,125,128,128,128,128]
value1=124
value2=127

So when looking for value1 it should return the return1=124 or the index. When the value isn't included like value2 it should return the highest value that is lower than the searched value. return2=125even if the higher value, 128, is closer.

I tried using the bisect module but I failed miserably. Any tipps are appreciated.

Greetings

lc123
  • 158
  • 9
Max
  • 13
  • 5
  • if column 1 is sorted then binary search (bisection) is your friend, so perhaps expand on "I failed miserably". – Tommy Apr 13 '16 at 18:57

1 Answers1

0

This is assuming that you have a sorted list:

def foo(the_list, value):
    index = bisect.bisect_left(the_list, value)
    return the_list[index] if the_list[index] == value or index == 0 else the_list[index-1]
Aquiles
  • 841
  • 7
  • 13