3

I've a function which objective is to fetch daily data for each variable on a column on a data.frame. Range is a complete month, but could be any other range.

My df has a column unit_id, so I need my function to take the first id of col unit_id and fetch the data for every single date of march.

| unit | unit_id |
|:-----:|----------|
|  AE   |    123   |
|  AD   |    456   |
|  AN   |    789   |

But right now, my function loops the ids in unit_id col. So as I've 3 ids, the 4th day the function uses the 1st id again, and then for the 5th day uses the 2nd id and so on. And this repeats until the last day of the month.

I need it to use each id for every day of the month.

code:

my_dates <- seq(as.Date("2020-03-01"), as.Date("2020-03-31"), by = 1)

my_fetch <- function(unit, unit_id, d) {


  df <- google_analytics(unit_id,
                         date_range = c(d, d),
                         metrics = c("totalEvents"),
                         dimensions = c("ga:date", "ga:eventCategory", "ga:eventAction", "ga:eventLabel"),
                         anti_sample = TRUE)

  df$unidad_de_negocio <- unit


  filename <- paste0(unit, "-", "total-events", "-", d, ".csv")
  path <- "D:\\america\\costos_protv\\total_events"
  write.csv(df, file.path(path, filename), row.names = FALSE)
  print(filename)
  rm(df)
  gc()


}




monthly_fetches <- mapply(my_fetch, df$unit,
                          df$unit_id,
                          my_dates, SIMPLIFY = FALSE)

Variation 2: By monthly ranges

Thank you, Akrun. Your answer works.

I'ven trying to edit it, ot use it in this other similar scenario:

1.- Monthly starts and ends: Now the loops isn't a single day date, but has an start and end. I've called this monthly_dates

|    starts   |    ends    |
|:-----------:|------------|
|  2020-02-01 | 2020-02-29 |
|  2020-03-01 | 2020-03-31 |

I've tried to adapt the solution, but it is not working. May you see it and tell me why? Thank you.

monthly_fetches <- Map(function(x, y) 
                   lapply(monthly_dates, function(d1, d2) my_fetch(x, y, monthly_dates$starts, monthly_dates$ends)))

Main function adapted to use 2 dates (start "d1" and end "d2"):

my_fetch <- function(udn, udn_id, d1, d2) {

    df <- google_analytics(udn_id,
                           date_range = c(d1, d2),
                           metrics = c("totalEvents"),
                           dimensions = c("ga:month"),
                           anti_sample = TRUE)

    df$udn <- udn
    df$udn_id <- udn_id

    df

}

** Code to make the monthly date ranges:**

make_date_ranges <- function(start, end){

  starts <- seq(from = start,
                to =  Sys.Date()-1 ,
                by = "1 month")

  ends <- c((seq(from = add_months(start, 1),
                 to = end,
                 by = "1 month" ))-1,
            (Sys.Date()-1))

  data.frame(starts,ends)

}

## useage
monthly_dates <- make_date_ranges(as.Date("2020-02-01"), Sys.Date())

Update 1:

dput(monthly_fetches[1])

list(AE = list(structure(list(month = "02", totalEvents = 19670334, 
    udn = "AE", udn_id = 74415341), row.names = 1L, totals = list(
    list(totalEvents = "19670334")), minimums = list(list(totalEvents = "19670334")), maximums = list(
    list(totalEvents = "19670334")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"), 
    structure(list(month = "03", totalEvents = 19765253, udn = "AE", 
        udn_id = 74415341), row.names = 1L, totals = list(list(
        totalEvents = "19765253")), minimums = list(list(totalEvents = "19765253")), maximums = list(
        list(totalEvents = "19765253")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"), 
    structure(list(month = "04", totalEvents = 1319087, udn = "AE", 
        udn_id = 74415341), row.names = 1L, totals = list(list(
        totalEvents = "1319087")), minimums = list(list(totalEvents = "1319087")), maximums = list(
        list(totalEvents = "1319087")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame")))

Update 2:

dput(monthly_fetches[[1]])

list(structure(list(month = "02", totalEvents = 19670334, udn = "AE", 
    udn_id = 74415341), row.names = 1L, totals = list(list(totalEvents = "19670334")), minimums = list(
    list(totalEvents = "19670334")), maximums = list(list(totalEvents = "19670334")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"), 
    structure(list(month = "03", totalEvents = 19765253, udn = "AE", 
        udn_id = 74415341), row.names = 1L, totals = list(list(
        totalEvents = "19765253")), minimums = list(list(totalEvents = "19765253")), maximums = list(
        list(totalEvents = "19765253")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"), 
    structure(list(month = "04", totalEvents = 1319087, udn = "AE", 
        udn_id = 74415341), row.names = 1L, totals = list(list(
        totalEvents = "1319087")), minimums = list(list(totalEvents = "1319087")), maximums = list(
        list(totalEvents = "1319087")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"))
Omar Gonzales
  • 3,806
  • 10
  • 56
  • 120

1 Answers1

1

As Map/mapply requires all arguments to be of same length and 'df' with number of rows of 3 and 'my_dates' length 31, one option is to loop over the 'df', columns and then do a further loop inside the Map/mapply

monthly_fetches <- Map(function(x, y) 
                 lapply(my_dates, function(date) my_fetch(x, y, date)),
                    df$unit, d$unit_id)

Or we can have outer loop for 'my_dates'

lapply(my_dates, function(date) Map(my_fetch, df$unit, df$unit_id, date))

Update

If we need to pass two columns, use Map

Map(function(start, end) 
  Map(my_fetch, df$unit, df$unit_id, start, end),  
            monthly_dates$starts, monthly_dates$ends))

Or

monthly_fetches <- Map(function(x, y) Map(function(start, end) 
   my_fetch(x, y, start, end),
      monthly_dates$starts, monthly_dates$ends), df$unit, df$unit_id)

Then rbind

do.call(rbind,lapply(monthly_fetches, function(x) do.call(rbind, x)))

Or use map

library(purrr)
library(dplyr)
map_dfr(monthly_fetches, bind_rows, .id = 'grp')
akrun
  • 874,273
  • 37
  • 540
  • 662
  • please see my updated question. I'm trying to modify a little bit you answer to take an start and end dates for each fetch. `monthly_fetches` returns a list of 0. May you tell me why? I think I'm missing on my `Map` and `lapply functions`. – Omar Gonzales Apr 05 '20 at 00:20
  • @arkun when I do: `total <- do.call(rbind, monthly_fetches)` with any of the options provided, I'm getting a matrix. When a data frame, rbinded was expected. – Omar Gonzales Apr 05 '20 at 00:33
  • @OmarGonzales Here, you may need `Map(function(x) do.call(rbind, Map(function(start, end) my_fetch(x, y, start, end), monthly_dates$starts, monthly_dates$ends)), df$unit, df$unit_id)` and then wrap `do.call(rbind` on top – akrun Apr 05 '20 at 00:36
  • got this error: `Error in (function (x) : unused argument (dots[[2]][[1]])` – Omar Gonzales Apr 05 '20 at 00:38
  • @OmarGonzales is it possible to update with `dput` of a sisngle list – akrun Apr 05 '20 at 00:45
  • @OmarGonzales i.e. `dput(monthly_fetches[[1]])` – akrun Apr 05 '20 at 00:47
  • @OmarGonzales try `do.call(rbind,lapply(monthly_fetches, function(x) do.call(rbind, x)))` – akrun Apr 05 '20 at 00:50
  • 1
    @OmarGonzales based on your `dput`, it is working for me – akrun Apr 05 '20 at 00:52