The 'from' and 'to' can be generated by 'shifting' the values from the following row, up to the 'current' row. And the stop information can simply be joined on
Let me explain with an example, and the use of library(data.table)
## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")
#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops
#setDT(dt_stop_times)
#setDT(dt_stops)
## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]
## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)
## create a new column by shifting the stop_id of the following row up
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]
## you will have NAs at this point because the last stop doesn't go anywhere.
## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"),
arrival_time_stop_to = shift(arrival_time, type = "lead"),
departure_time_stop_to = shift(departure_time, type = "lead")),
by = .(trip_id)]
## now you have your 'from' and 'to' columns from which you can make your igraph
## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]
# trip_id stop_id stop_name_from arrival_time stop_id_to
# 1: 1.T0.3-86-A-mjp-1.7.R 4174 71-RMIT/Plenty Rd (Bundoora) 25:42:00 4485
# 2: 1.T0.3-86-A-mjp-1.7.R 4485 70-Janefield Dr/Plenty Rd (Bundoora) 25:43:00 4486
# 3: 1.T0.3-86-A-mjp-1.7.R 4486 69-Taunton Dr/Plenty Rd (Bundoora) 25:44:00 4487
# 4: 1.T0.3-86-A-mjp-1.7.R 4487 68-Greenhills Rd/Plenty Rd (Bundoora) 25:45:00 4488
# 5: 1.T0.3-86-A-mjp-1.7.R 4488 67-Bundoora Square SC/Plenty Rd (Bundoora) 25:46:00 4489
# ---
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H 17871 7-Queen Victoria Market/Elizabeth St (Melbourne City) 23:25:00 17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H 17873 5-Melbourne Central Station/Elizabeth St (Melbourne City) 23:27:00 17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H 17875 3-Bourke Street Mall/Elizabeth St (Melbourne City) 23:30:00 17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H 17876 2-Collins St/Elizabeth St (Melbourne City) 23:31:00 17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H 17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City) 23:32:00 NA
# arrival_time_stop_to
# 1: 25:43:00
# 2: 25:44:00
# 3: 25:45:00
# 4: 25:46:00
# 5: 25:47:00
# ---
# 9415793: 23:27:00
# 9415794: 23:30:00
# 9415795: 23:31:00
# 9415796: 23:32:00
# 9415797: NA
Now, to use graph_from_data_frame{igraph}
you just need to:
# get a df with nodes
nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]
# links beetween stops
links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]
# create graph
g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)
Mind you however, that in a GTFS.zip
file you might have more than one transport mode (train, bus, subway etc) and that some pairs of stops have much higher connectivity than others due to variations in service frequency. It's not clear to me yet how these two points should be considered when building a graph from a GTFS.zip
. Probably the way forward would be to weight each edge according to its frequency and build a multilayered network with some stops in common across each transport mode treated as an interdependent layer.