1

I am new to using R and apply and I am trying to download a set of .csv files from a website.

I want to download the years 2004 and 2005 (as an example; I want more years in fact) of three countries, Guatemala (GT), El Salvador (SV), and Honduras (HN).

I could run country by country something like this:

years = c(2004, 2005)    
Map(download.file, url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", years, "/DEUAGT%20S1%20", years, ".csv"), 
          destfile = paste0(raw_data, years, ".csv") )

This would get me Guatemalan databases for the years 2004 and 2005, as the Guatemalan bases are defined by "DEAUGT" in the URL. The Honduran and El Salvatorian databases are "DEAUHN" and "DEAUSV", respectively.

But since I'm trying to learn, I wanted to make everything in "one run". So I tried:

countries = c("GT", "HN", "SV")
years = c(2004, 2005, 2007, 2009:2019)

Map(possibly(download.file, otherwise = NA), url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", years, "/DEUA", countries, "%20", years, ".csv"), 
              destfile = paste0(raw_data, countries, years,".csv"))

But instead of downloading the 6 files I wanted (three countries, two years), it downloaded 2 files.

Various posts I found here noted and in RStudio community noted that Map/mapply did not run through all possible combinations of the lists "countries" and "years", and rather made point-wise (or something similar).

I found various suggestions in different settings but none particularly easy, and something tells me there is an easy solution for this. Using expand.grid creates a data frame and not a list of lists.

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
jpugliese
  • 261
  • 1
  • 11
  • You would like to download from url of all possible combinations of `countries` and `years`? – Anoushiravan R Jul 20 '21 at 19:46
  • 1
    Yes! But because of the URL of the file, I cannot make a list "GT2012, GT2013" and so on. I need something that loops through all years for one country, then repeats for the other, etc. – jpugliese Jul 20 '21 at 19:47

1 Answers1

2

you can use the following solution. It is better if we use purrr::walk2 in place of purrr::map2 as we are calling download.file for its side effect, so walk2 would is a better option:

library(purrr)

# First we create a data frame of all combinations of countries and years
comb <- expand.grid(countries, years)

# Then I wrap `download.file` with possibly for error handling
poss_download <- possibly(download.file, otherwise = NA)

# Then I apply our function on every combination of countries and years 
# in a row-wise operation

walk2(comb$Var1, comb$Var2, ~ {
  url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", .y, "/DEUA", .x, "%20", .y, ".csv")
  destfile = paste0(raw_data, .x, .y,".csv")
  poss_download(url, destfile)
})

Here is a base R solution for this question.

  • Instead of paste0 I used sprintf function which according to documentation "returns a character vector containing a formatted combination of text and variable values". I used %d for integer/numeric values(2 times for years) and %s for character strings (once for countries) and it should be noted that we have to provide as many variables so that they are incorporated in their places to form a single string of length one
  • Then I used tryCatch in place of purrr::possibly to handle possible errors
  • In the end I used mapply or Map to iterate on both vectors url and destfile at the same time
comb <- expand.grid(countries, years)

url <- sprintf("https://www.colef.mx/emif/datasets/basesdeDatos/sur/%d/DEUA%s%d.csv", comb$Var2, comb$Var1, comb$Var2)

destfile = paste0(raw_data, comb$Var1, comb$Var2,".csv")

mapply(function(x, y) {
  tryCatch(download.file(url, destfile),
           error = function(e) {
             NA
           })
}, url, destfile)
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    That does it! Thank you very much. There is a double comma inside paste0, but apart from that, it runs as it is. I really thought there would exist one function that just runs through every combination. – jpugliese Jul 20 '21 at 20:06
  • 1
    Glad it helped. I didn't notice that! where is it exactly? so that I can edit it. I will also present a base R solution. I noticed you used `possibly` thought you are familiar with `purrr` and went along with it. – Anoushiravan R Jul 20 '21 at 20:09
  • 1
    In brackets: paste0(raw_data, {,} .x, .y,".csv") – jpugliese Jul 20 '21 at 20:14
  • 1
    @Anoushiravan. This is fantastic. You are my `purrr` man!!! – TarJae Jul 20 '21 at 20:38
  • @jpugliese Thank you I edited and also add a base R solution. – Anoushiravan R Jul 20 '21 at 20:43
  • 1
    @TarJae Thank you that's very kind of you. No come on! I just say the OP used `possibly` inside base R codes and decided to go with it lol. – Anoushiravan R Jul 20 '21 at 20:44