2

Does anyone have a trick up there sleeve for downloading GTFS using R when the URL doesn't end with ".zip"? For instance, this works:

download.file(url = "http://www.transperth.wa.gov.au/TimetablePDFs/GoogleTransit/Production/google_transit.zip", destfile = "temp.zip")

But the following create files of the right size that will not open:

download.file(url = "http://transitfeeds.com/p/ptv/497/latest/download", destfile = "temp.zip")

download.file(url = "http://transitfeeds.com/p/ptv/497/latest/download", destfile = "temp")

I suspect there is something fundamental I need to understand about urls but I don't know where to beging looking so any pointers would be appreciated.

Cheers,

Anthony

Zephyr
  • 11,891
  • 53
  • 45
  • 80
Anthony
  • 23
  • 2
  • In the transitfeeds example you gave, are you sure there is anything to download? It gives a 404 error. – rdodhia Feb 24 '21 at 07:26

2 Answers2

3

Your link is probably a redirect. Try using the httr package as described here R download file redirect error

library(httr)

url <- "http://transitfeeds.com/p/ptv/497/latest/download"    
GET(
        url = url,
        write_disk("gtfs.zip"),
        verbose()
    ) -> res

I was able to download the file and open it. If it works you can remove the verbose() part.

kukuk1de
  • 386
  • 1
  • 12
  • Thank you @kukuk1de, "redirect" was the term I needed, the http library seems much better suited for the task, and the verbose object will come in handy from now on! I owe you a beer! – Anthony Feb 26 '21 at 05:21
  • @Anthony, good to hear your problem is solved. If you think my answer was helpful please accept it as solution. And thanks for the beer :-) – kukuk1de Feb 26 '21 at 06:21
0

@kukul1de answer does the trick.

I'd also note that transitfeeds links to the official download URL. The link is located on the right, under "About This GTFS Feed" (check the image below):

enter image description here

Then you can right-click and select "Copy Link Location", which will give you the official URL with a .zip extension, which you can use in conjunction with download.file().

HOWEVER, this specific URL links to a file which is actually a .zip that contains many folders, each one containing a distinct GTFS file, and not a .zip in the GTFS format.

Was it an actual GTFS .zip file you would be able to use either {gtfstools} or {tidytransit} to read it, but unfortunately the file format does not allow it. Check it out:

tmp <- tempfile(pattern = "gtfs", fileext = ".zip")

download.file(
    "http://data.ptv.vic.gov.au/downloads/gtfs.zip", 
    destfile = tmp
)

zip::zip_list(tmp)
#>                 filename compressed_size uncompressed_size           timestamp
#> 1                     1/               0                 0 2021-02-22 19:23:20
#> 2                    10/               0                 0 2021-02-22 19:23:20
#> 3  10/google_transit.zip            3231              4011 2021-02-22 19:09:56
#> 4                    11/               0                 0 2021-02-22 19:23:20
#> 5  11/google_transit.zip           29966             32109 2021-02-22 19:10:12
#> 6   1/google_transit.zip         7262254           7625276 2021-02-22 19:01:56
#> 7                     2/               0                 0 2021-02-22 19:23:20
#> 8   2/google_transit.zip         5667379           6269932 2021-02-22 19:03:34
#> 9                     3/               0                 0 2021-02-22 19:23:20
#> 10  3/google_transit.zip         6714271           7782585 2021-02-22 19:05:04
#> 11                    4/               0                 0 2021-02-22 19:23:20
#> 12  4/google_transit.zip        66336783          67508547 2021-02-22 19:23:16
#> 13                    5/               0                 0 2021-02-22 19:23:20
#> 14  5/google_transit.zip        27834469          27962731 2021-02-22 19:06:16
#> 15                    6/               0                 0 2021-02-22 19:23:20
#> 16  6/google_transit.zip        13730731          14172729 2021-02-22 19:09:10
#> 17                    7/               0                 0 2021-02-22 19:23:20
#> 18  7/google_transit.zip           46932             50417 2021-02-22 19:09:24
#> 19                    8/               0                 0 2021-02-22 19:23:20
#> 20  8/google_transit.zip          574316            580906 2021-02-22 19:09:42

Let's say you want to read the GTFS file inside the 1/ folder. Then you can unzip this file with zip::unzip():

tmpd <- file.path(tempdir(), "tmp_gtfs")
dir.create(tmpd)

zip::unzip(tmp, files = "1/google_transit.zip", exdir = tmpd)

list.files(tmpd)
#> [1] "1"
list.files(file.path(tmpd, "1"))
#> [1] "google_transit.zip"

And read it with {gtfstools} or {tidytransit}. It depends on what you wanna do with the file:

gtfs_path <- file.path(tmpd, "1", "google_transit.zip")

gt_gtfs <- gtfstools::read_gtfs(gtfs_path)
names(gt_gtfs)
#> [1] "agency"         "routes"         "trips"          "stops"         
#> [5] "calendar"       "calendar_dates" "shapes"         "stop_times"

tt_gtfs <- tidytransit::read_gtfs(gtfs_path)
names(tt_gtfs)
#> [1] "agency"         "routes"         "trips"          "stops"         
#> [5] "calendar"       "calendar_dates" "shapes"         "stop_times"
dhersz
  • 525
  • 2
  • 8
  • 1
    Thanks for looking into this. My aim has been to get GTFS for multiple Australian cities but unfortunately the official Melbourne feed is split by provider and/or mode. The "latest version" on transitfeeds.com has already merged the Melbourne GTFS so it simplifies the remaining processing in my script. Yes, the tidytransit library is what I've been using as the documentation has been much more helpful that the other gtfs r libraries – Anthony Feb 26 '21 at 05:31