my question is about applying a complicated function to every row of a table.
I'm trying to find the traveling time and route of some pairs of points using the osrm package in r (https://cran.r-project.org/web/packages/osrm/osrm.pdf). My data looks like this - each row is a pair of origin-destination points:
ID_o | ID_d | longitude_o | latitude_o | longitude_d | latitude_d |
---|---|---|---|---|---|
1 | 5 | -122.2925 | 47.72932 | -122.2820 | 47.73027 |
2 | 6 | -122.2820 | 47.73027 | -122.2944 | 47.72293 |
3 | 7 | -122.3365 | 47.72512 | -122.3153 | 47.71490 |
4 | 8 | -122.3264 | 47.70752 | -122.3151 | 47.70674 |
I can use the function in osrm to obtain the route for any one row
time.route1 <- osrmRoute(src = mydata[1, c('longitude_o', 'latitude_o')],
dst = mydata[1, c('longitude_d', 'latitude_d')],
returnclass = "sf")
I can also write a loop to compute what I need for multiple rows
time.route2 <- data.frame(matrix(, nrow=4, ncol=5))
for (ix in c(1:4) ) {
route.temp <- osrmRoute(src = mydata[ix, c('longitude_o', 'latitude_o')],
dst = mydata[ix, c('longitude_d', 'latitude_d')],
returnclass = "sf")
time.route2[ix, ] <- route
}
in which I simply apply the function to each row sequentially. But loop runs slow (I have millions of rows) and stops unexpectedly when there is an NA in my raw data. And it's clear that the computation of one row has nothing to do with all the others. So it's possible to do them simultaneously.
Is there a way to do parallel computing on each row at the same time? Using apply
or map
function or other methods? Simple examples of apply
and map
function doesn't help since osrmRoute
is a quite complicated function.
I tried the following
biroute <- function(geofile, ix=1) {
osrmRoute(src = geofile[ix, c('longitude_o', 'latitude_o')],
dst = geofile[ix, c('longitude_d', 'latitude_d')])
}
route <- apply(mydata, 1, biroute)
but an error occurs when executing the osrmRoute
function saying "incorrect number of dimensions".