I am dealing with a situation wherein I have multiple, distinct data sets with different column names, but the functions to be applied to them are similar. I thought, to reduce code duplication, I could create another dataset of column names, and the function to be applied to them:
- raw data (whose column positions can change, so we rely on column headers)
- dataframe with column headers and corresponding function to be applied
### The raw data set
df1 <- tibble(A=c(NA, 1, 2, 3), B = c(1,2,1,NA),
C = c(NA,NA,NA,2), D = c(2,3,NA,1), E = c(NA,NA,NA,1))
# A tibble: 4 x 5
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 1 NA 2 NA
2 1 2 NA 3 NA
3 2 1 NA NA NA
4 3 NA 2 1 1
### The dataframe containing functions
funcDf <- tibble(colNames = names(df1), type = c(rep("Compulsory", 4), "Conditional"))
funcDf$func <- c("is.na()", "is.na()", "is.na()", "is.na()",
"ifelse(!is.na(D) & is.na(E), 0, ifelse(!is.na(D) & !is.na(E), 1, 0))")
# A tibble: 5 x 3
colNames type func
<chr> <chr> <chr>
1 A Compulsory is.na()
2 B Compulsory is.na()
3 C Compulsory is.na()
4 D Compulsory is.na()
5 E Conditional ifelse(!is.na(D) & is.na(E), 0, ifelse(!is.na(D) & !is.na(E), 1,~
I am able to get a simple sum running, like so:
df1 %>% summarise_at(.vars = funcDf$colNames, .funs = list(~sum(., na.rm = T)))
But I am not able to apply the functions I have recorded in the dataframe against the corresponding variable.
Any guidance, please :)
Edit
I expect to have the following output as a result of applying the function:
# A tibble: 1 x 5
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 1 2
@YinYan, thanks so much for indulging me, but for my comment, what if I need the following output (with grouping, as you can see in my code):
df1 %>% group_by(A, B) %>% summarise_all(.funs = list(~sum(., na.rm = T)))
# A tibble: 4 x 5
# Groups: A [4]
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 0 3 0
2 2 1 0 0 0
3 3 NA 2 1 1
4 NA 1 0 2 0