0

I just came across this powerful R package but unfortunately haven't been able to find out how to parse a list of urls in parallel where the response is in JSON.

As a simple example, suppose I have a list of cities (in Switzerland):

list_cities <- c("Winterthur", "Bern", "Basel", "Lausanne", "Lugano")

In a next step I'd like to find public transport connections to the city of Zurich for each of the listed cities. I can use the following transport api to query public timetable data:

https://transport.opendata.ch

Using the httr package, I can make a request for each city as follows:

for (city in list_cities) {
   r <- GET(paste0("http://transport.opendata.ch/v1/connections?from=", city, "&to=Zurich&limit=1&fields[]=connections/duration"))
   cont <- content(r, as = "parsed", type = "application/json", encoding = "UTF-8")
}

to get the duration of the individual journeys. However, I have a much longer list and more destinations. That's why I am looking for a way to make multiple requests in parallel.

Patrick Balada
  • 1,330
  • 1
  • 18
  • 37

1 Answers1

1

Note I did not test this - but first, you would initialize your parallel workers

library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterEvalQ(cl, { library(Rcrawler) }) # load required packages onto each parallel worker

Make function with your relevant commands

custom_parse_json <- function(city) {
    r <- GET(paste0("http://transport.opendata.ch/v1/connections?from=", city, "&to=Zurich&limit=1&fields[]=connections/duration"))
    cont <- content(r, as = "parsed", type = "application/json", encoding = "UTF-8")
    return(cont)
}

Export function to each parallel work

clusterExport(cl, c("custom_parse_json"))

Loop through list of cities

parLapply(cl, list_cities, function(i) custom_parse_json(i))

This should return a list of your JSON content.

CPak
  • 13,260
  • 3
  • 30
  • 48