2

I am following a vignette for gtfstools (https://cran.r-project.org/web/packages/gtfstools/vignettes/gtfstools.html) but am getting stuck with the data format. Basically, I am linking to a gtfs dataset, which is a zip folder with .txt files inside it.

ART2019Path <- file.path(GTFS_path, "2019-10 Arlington.zip")
ART2019GTFS <- read_gtfs(ART2019Path) 

Here is the data: https://realtime.commuterpage.com/rtt/public/utility/gtfs.aspx

The data loads fine but it is automatically read as all characters. I need most of the data to be numeric for my data analysis purposes. For example, showing transit geometry:

trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
plot(trip_geom$geometry)

I tried mutating all data, assuming data without numbers would stay as characters, but it didn't work:

ART2019GTFS <- mutate_all(ART2019GTFS, funs(as.numeric))

I am relatively new to R so not sure how to tackle this.

Any help figuring this out would be appreciated.

1 Answers1

2

When I follow that link I get a zip file named google_transit.zip which has several comma separated text files in it. When I runthis:

ART2019GTFS <- read_gtfs("~/google_transit.zip") 

I get this (one dataframe for each text file):

> str(ART2019GTFS)
List of 8
 $ agency        :Classes ‘data.table’ and 'data.frame':    1 obs. of  6 variables:
  ..$ agency_id      : chr "1"
  ..$ agency_name    : chr "Arlington Transit"
  ..$ agency_url     : chr "http://www.arlingtontransit.com"
  ..$ agency_phone   : chr "703-228-7433"
  ..$ agency_timezone: chr "America/New_York"
  ..$ agency_lang    : chr "en"
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ calendar      :Classes ‘data.table’ and 'data.frame':    5 obs. of  10 variables:
  ..$ service_id: chr [1:5] "1" "2" "3" "4" ...
  ..$ monday    : int [1:5] 1 0 1 0 0
  ..$ tuesday   : int [1:5] 1 0 1 0 0
  ..$ wednesday : int [1:5] 1 0 1 0 0
  ..$ thursday  : int [1:5] 1 0 1 0 0
  ..$ friday    : int [1:5] 0 1 1 0 0
  ..$ saturday  : int [1:5] 0 0 0 1 0
  ..$ sunday    : int [1:5] 0 0 0 0 1
  ..$ start_date: Date[1:5], format: "2022-03-27" "2022-03-27" "2022-03-27" ...
  ..$ end_date  : Date[1:5], format: "2023-12-31" "2023-12-31" "2023-12-31" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ calendar_dates:Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
  ..$ service_id    : chr [1:3] "1" "3" "5"
  ..$ date          : Date[1:3], format: "2022-05-30" "2022-05-30" "2022-05-30"
  ..$ exception_type: int [1:3] 2 2 1
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ routes        :Classes ‘data.table’ and 'data.frame':    21 obs. of  8 variables:
  ..$ route_id        : chr [1:21] "41" "42" "43" "45" ...
  ..$ agency_id       : chr [1:21] "1" "1" "1" "1" ...
  ..$ route_short_name: chr [1:21] "41" "42" "43" "45" ...
  ..$ route_long_name : chr [1:21] "Columbia Pike-Ballston-Court House" "Ballston-Pentagon" "Crystal City-Courthouse" "Columbia Pike-DHS/Sequoia-Rosslyn" ...
  ..$ route_type      : int [1:21] 3 3 3 3 3 3 3 3 3 3 ...
  ..$ route_color     : chr [1:21] "DCC154" "D7171F" "BC1B8D" "0084CA" ...
  ..$ route_text_color: chr [1:21] "FFFFFF" "FFFFFF" "FFFFFF" "FFFFFF" ...
  ..$ route_url       : chr [1:21] "https://www.arlingtontransit.com/routes-schedules/art-41/" "https://www.arlingtontransit.com/routes-schedules/art-42/" "https://www.arlingtontransit.com/routes-schedules/art-43/" "https://www.arlingtontransit.com/routes-schedules/art-45/" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ shapes        :Classes ‘data.table’ and 'data.frame':    10721 obs. of  4 variables:
  ..$ shape_id         : chr [1:10721] "9" "9" "9" "9" ...
  ..$ shape_pt_lon     : num [1:10721] -77.1 -77.1 -77.1 -77.1 -77.1 ...
  ..$ shape_pt_lat     : num [1:10721] 38.9 38.9 38.9 38.9 38.9 ...
  ..$ shape_pt_sequence: int [1:10721] 1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ stop_times    :Classes ‘data.table’ and 'data.frame':    57711 obs. of  7 variables:
  ..$ trip_id       : chr [1:57711] "1" "1" "1" "1" ...
  ..$ arrival_time  : chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
  ..$ departure_time: chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
  ..$ stop_id       : chr [1:57711] "138" "141" "867" "144" ...
  ..$ stop_sequence : int [1:57711] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ stop_headsign : chr [1:57711] "" "" "" "" ...
  ..$ timepoint     : int [1:57711] 1 0 0 1 0 0 0 0 0 0 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ stops         :Classes ‘data.table’ and 'data.frame':    640 obs. of  6 variables:
  ..$ stop_id  : chr [1:640] "83" "85" "87" "89" ...
  ..$ stop_code: chr [1:640] "51001" "51003" "51005" "51007" ...
  ..$ stop_name: chr [1:640] "Ballston Metro G, Fairfax Dr, EB @ N Stafford, NS" "Fairfax Drive, WB @ N Utah Street, FS" "16th Street N, WB @ N Glebe Road, FS" "16th Street N, WB @ N Buchanan Street, NS" ...
  ..$ stop_lat : num [1:640] 38.9 38.9 38.9 38.9 38.9 ...
  ..$ stop_lon : num [1:640] -77.1 -77.1 -77.1 -77.1 -77.1 ...
  ..$ stop_url : chr [1:640] "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51001#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51003#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51005#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51007#realTimeResultsContainer" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ trips         :Classes ‘data.table’ and 'data.frame':    2296 obs. of  7 variables:
  ..$ route_id     : chr [1:2296] "52" "52" "52" "52" ...
  ..$ service_id   : chr [1:2296] "3" "3" "3" "3" ...
  ..$ trip_id      : chr [1:2296] "1" "2" "3" "4" ...
  ..$ trip_headsign: chr [1:2296] "Ballston Metro" "Ballston Metro" "Ballston Metro" "Ballston Metro" ...
  ..$ direction_id : int [1:2296] 0 0 0 0 0 1 1 1 1 1 ...
  ..$ block_id     : chr [1:2296] "5202" "5202" "5202" "5202" ...
  ..$ shape_id     : chr [1:2296] "76" "76" "76" "76" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "class")= chr [1:3] "dt_gtfs" "gtfs" "list"

And then this apparently succeeds:

> trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
> str(trip_geom)
Classes ‘sf’, ‘data.table’ and 'data.frame':    2296 obs. of  3 variables:
 $ trip_id    : chr  "1" "2" "3" "4" ...
 $ origin_file: chr  "shapes" "shapes" "shapes" "shapes" ...
 $ geometry   :sfc_LINESTRING of length 2296; first list element:  'XY' num [1:131, 1:2] -77.2 -77.2 -77.2 -77.2 -77.2 ...
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "trip_id" "origin_file"
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you for demonstrating this. Knowing it worked with a fresh download I tried again and got it to work. The issue I had before was that I had unzipped the files to look at the txt files individually and then zipped them up again. For some reason, that changed the way the data were interpreted by R. Thanks again for your help! – TransitHarmony Apr 07 '22 at 12:10