0

I was searching for a method to paralleize for loops in Python 3. I found joblib library that is convenient to use but cannot find a way to use in my problem. Is there a way use Joblib to parallelize the following for loop in python 3.

def calculate_neighbour_trip(test_pickup, test_dropoff):
  neighbour_trips_index = []

  for i in range(len(pickup_lat)):            # This line needs to be parallelized
    pickup_train = (pickup_lat[i], pickup_long[i])
    pickup_rel_dist = geo_distance.vincenty(test_pickup, pickup_train).km
    if(pickup_rel_dist<=0.5):
        dropoff_train = (dropoff_lat[i], dropoff_long[i])
        dropoff_rel_dist = geo_distance.vincenty(test_dropoff, dropoff_train).km
        if(dropoff_rel_dist<=0.5):
            neighbour_trips_index.append(i)
  return neighbour_trips_index

Documentation does not specifically show how to parallelize such code segments.

Klaus
  • 1,641
  • 1
  • 10
  • 22
  • I imagine you actually want to parallelize `pickup_rel_dist = geo_distance.vincenty(test_pickup, pickup_train).km` part, the `for .. in` loop is as fast as it gets, parallelizing would only slow it down (one has to deal with disseminating it to different processes which adds a significant overhead). – zwer Sep 23 '18 at 05:42
  • that line is a direct call to geopy.distance method. I cannot do anything there. – Klaus Sep 23 '18 at 05:48
  • You can parallelize the call to that method because, I presume, that's where the bottleneck actually is (i.e. the execution of that method is what takes significant enough time to warrant multiprocessing). – zwer Sep 23 '18 at 05:53

0 Answers0