1

I have a python script that calls OpenTripPlanner to calculate a travel time matrix between pairs of geolocated points. The python script uses a for loop to calculate one matrix for every departure time and save it as a separate .csv file.

I would like to parallelize this operation to make it faster. Although this task seems to be embarrassingly parallel, I know virtually nothing of Python. From what I've read, my hunch is that the best solution to this would be to use multithreading, keeping each spt in it's thread.

I would really appreciate some help. A simple reproducible example is available here.

This is the python script:

#!/usr/bin/jython
from org.opentripplanner.scripting.api import OtpsEntryPoint


# Instantiate an OtpsEntryPoint
otp = OtpsEntryPoint.fromArgs(['--graphs', '.',
                               '--router', 'sto'])

# Start timing the code
import time
start_time = time.time()

# Get the default router
router = otp.getRouter('sto')


# Read Points of Destination - The file points.csv contains the columns GEOID, X and Y.
points = otp.loadCSVPopulation('centroids_sto.csv', 'Y', 'X')
dests = otp.loadCSVPopulation('centroids_sto.csv', 'Y', 'X')


for h in range(7, 19):
  for m in range(0,60,30):

    # Create a default request for a given time
    req = otp.createRequest()
    req.setDateTime(2015, 12, 28, h, m, 00)
    req.setMaxTimeSec(3600) # 1h = 3600 seconds , 2h = 7200 seconds
    req.setModes('WALK,TRANSIT,BUS,TRAM,RAIL,SUBWAY')  # ("TRAM,RAIL,SUBWAY,FUNICULAR,GONDOLA,CABLE_CAR,BUS")


    # Create a CSV output
    matrixCsv = otp.createCSVOutput()
    matrixCsv.setHeader([ 'mode', 'depart_time', 'origin', 'destination', 'walk_distance', 'travel_time' ]) # travel_time in seconds

    # Start Loop
    for origin in points:
      print "Processing origin: ", str(h)+"-"+str(m)," ", origin.getStringData('idhex')
      req.setOrigin(origin)
      spt = router.plan(req)
      if spt is None: continue

      # Evaluate the SPT for all points
      result = spt.eval(dests)

      # Add a new row of result in the CSV output
      for r in result:
        matrixCsv.addRow([ 'public transport', str(h) + ":" + str(m) + ":00", origin.getStringData('idhex'), r.getIndividual().getStringData('idhex'), r.getWalkDistance() , r.getTime()])

    # Save the result
    matrixCsv.save('traveltime_matrix_sto_pt_'+ str(h)+"-"+str(m) + '.csv')


# Stop timing the code
print("Elapsed time was %g seconds" % (time.time() - start_time))
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
  • Could you make more explicit the part that you want to be threaded? Like cut that off into a function eg: `function_to_thread` since this will be useful in the threading call. – raphael Nov 21 '16 at 01:21
  • @raphael I think the simplest way would be to allocate a subset of origins to each thread. So OTP would still work on one travel time matrix (departure time) at a time, but each core would work on a different slice of origins . I think the multi-threading function would mostly concentrate on this part of the code: `req.setOrigin(origin)` `spt = router.plan(req)` – rafa.pereira Nov 21 '16 at 06:37
  • Your code is too complex to be easily reproduced. Give us the csv and a simpler example. However, I suggest you to use this easy library: https://pythonhosted.org/joblib/parallel.html. So you want to parallelize the longest loop (you decide). the days? the hours? the points? then you copy a req object and you parallelize everything in a function with the library I sent you :)) – marcodena Nov 26 '16 at 12:46

0 Answers0