multiple maps to mutate a data frame and add a column

Question

I am working with some data which looks like:

# A tibble: 2 x 3
  splits          id        inner_resamples 
  <named list>    <chr>     <named list>    
1 <split [20/20]> Resample1 <tibble [6 x 2]>
2 <split [20/20]> Resample2 <tibble [6 x 2]>

What I want to do is to map over the inner_resamples column and map again over the splits column in each inner_resamples column. For each list I would like to map again.

The way to do this is to use the analysis function from the rsample package.

map(cv_rolling$inner_resamples$`1`$splits, ~ analysis(.x)) %>% tail()

What I would like to do is to map over each of the outputs and create new data 7 columns:

    > map(cv_rolling$inner_resamples$`1`$splits, ~ analysis(.x)) %>% tail()
[[1]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-13 CAT1   796.     1
2 2016-12-14 CAT1   797.     0

[[2]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-15 CAT1   798.     1
2 2016-12-16 CAT1   791.     0

[[3]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-19 CAT1   794.     1
2 2016-12-20 CAT1   796.     0

[[4]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-21 CAT1   795.     0
2 2016-12-22 CAT1   791.     0

[[5]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-23 CAT1   790.     0
2 2016-12-27 CAT1   792.     1

[[6]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-28 CAT1   785.     0
2 2016-12-29 CAT1   783.     0

Expected output would be (for 1 of the outputs)

[[6]]
# A tibble: 2 x 4
  time       ID    Value   out    NEWCOL
  <date>     <chr> <dbl> <dbl>    
1 2016-12-28 CAT1   785.     0    8677 
2 2016-12-29 CAT1   783.     0    8757

However I would like to also do this for each N in the data:

map(cv_rolling$inner_resamples$`N`$splits, ~ analysis(.x)) %>% tail()

Where N here can be accessed by:

cv_rolling$inner_resamples[[1]]
cv_rolling$inner_resamples[[2]]
cv_rolling$inner_resamples[[N]]

New Data:

structure(list(time = structure(c(17136, 17137, 17140, 17141, 
17142, 17143, 17144, 17147, 17148, 17149, 17150, 17151, 17154, 
17155, 17156, 17157, 17158, 17162, 17163, 17164, 17165, 17136, 
17137, 17140, 17141, 17142, 17143, 17144, 17147, 17148, 17149, 
17150, 17151, 17154, 17155, 17156, 17157, 17158, 17162, 17163, 
17164, 17165), class = "Date"), ID = c("CAT1", "CAT1", "CAT1", 
"CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", 
"CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", 
"CAT1", "CAT1", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", 
"CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", 
"CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2"), Value = c(747.919983, 
750.5, 762.52002, 759.109985, 771.190002, 776.419983, 789.289978, 
789.27002, 796.099976, 797.070007, 797.849976, 790.799988, 794.200012, 
796.419983, 794.559998, 791.26001, 789.909973, 791.549988, 785.049988, 
782.789978, 771.820007, 56.283112, 56.330643, 57.252861, 56.996159, 
58.346195, 58.003925, 58.916634, 59.106773, 59.876858, 59.591648, 
59.496574, 59.230362, 60.485325, 60.409275, 60.409275, 60.418777, 
60.124058, 60.162071, 59.886375, 59.800812, 59.078251), out = c(0, 
1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0)), row.names = c(NA, 
-42L), index_quo = ~date, index_time_zone = "UTC", class = c("tbl_time", 
"tbl_df", "tbl", "data.frame"))

Also need to run:

library(rsample)
library(purrr)
library(tibbletime)

periods_train <- 2
periods_test  <- 1
skip_span     <- 1

cv_rolling <- nested_cv(df, 
                        outside = group_vfold_cv(group = "ID"),
                        inside = rolling_origin(
                          initial    = periods_train,
                          assess     = periods_test,
                          cumulative = FALSE,
                          skip       = skip_span))

Where the following can be run:

map(cv_rolling$inner_resamples$`2`$splits, ~ analysis(.x))

Which is what I am trying to modify / create new data from.

I am not so sure that this will work for this data since at depth = 3 I have `cv_rolling$inner_resamples[[1]]` which can be changed to `cv_rolling$inner_resamples[[2]]`... `cv_rolling$inner_resamples[[N]]`. I am still reading about the function though. — user113156, Oct 19 '19 at 15:56
I'm trying to read in your data, but getting "Error in is_null(vars) : argument "data" is missing, with no default" — camille, Oct 19 '19 at 16:23
I will upload better data now. I have made quite some edits to the original post since what I originally stated was correct but accessed the "wrong" data. I will get the fresh data now. — user113156, Oct 19 '19 at 16:24
Okay, I added new data, in order to get it into the original `dput()` output the `nested_cv` function should be run. — user113156, Oct 19 '19 at 16:37
It is not clear to me, with what function you generate NEWCOL based on your call to `map(cv_rolling$inner_resamples$`1`$splits, ~ analysis(.x))`. — TimTeaFan, Oct 19 '19 at 22:38
I also had trouble dealing with splits from rsample in map calls. Maybe my question [here](https://stackoverflow.com/questions/58234447/bootstrapping-by-multiple-groups-in-the-tidyverse-rsample-vs-broom) can help? — TimTeaFan, Oct 19 '19 at 22:41

score 1 · Answer 1 · answered Oct 20 '19 at 09:16

I am not sure, what kind of function you want to apply to generate NEWCOL, but here is some toy example for your data which just divides the original Value column by 10:

cv_rolling %>% 
  mutate(data  = map(inner_resamples, "splits"),
         data2 = map_depth(data, 2, rsample::analysis),
         data3 = map_depth(data2, 2, ~ mutate(.x, NEWCOL = Value/10)))

If the mutate call is rather complex you could put it in a helper function.

mutate_helper <- function(df) {
  mutate(df, NEWCOL = Value/10)
}

cv_rolling %>% 
  mutate(data  = map(inner_resamples, "splits"),
         data2 = map_depth(data, 2, rsample::analysis),
         data3 = map_depth(data2, 2, mutate_helper))

multiple maps to mutate a data frame and add a column

1 Answers1