In my example data I have 3 dataframes. Every df has 2 variables (varA and varB) per threshold. There are 3 thresholds (1, 2, 3):
df1 <- tibble(
var1A= rnorm(1:10) +1,
var1B= rnorm(1:10) +1,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
df2 <- tibble(
var1A= rnorm(1:10) +1,
var1B= rnorm(1:10) +1,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
df3 <- tibble(
var1A= rnorm(1:10) +1,
var1B= NA,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
Now I want to perform a t.test for each variables t.test(varA, varB)
and for each threshold (1, 2, 3).
Since I have more than 1 df, I put all df's in a map function and map the t.test for all df's and apply the t.test for all thresholds:
thresholds = c(1, 2, 3)
list_dfs = c('df1','df2','df3')
map(list_dfs,
function(df_name){
x <- get(df_name)
lapply(thresholds, function(i){
t.test(x %>%
pull(paste0("var",i,"A")),
x %>%
pull(paste0("var",i,"B")))
}) %>%
map_df(broom::tidy) %>%
add_column(.before = 'estimate',
df = df_name,
threshold = thresholds)
}) %>%
do.call(rbind, .)
This code will map all results in one df. But the problem ist that var1B
in df3
is empty. The whole column is NA
.
How can I perform the map-function, although there are not enough observations for var1B
?
Here is my desired output:
# A tibble: 9 x 12
df threshold estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 df1 1 -0.582 0.992 1.57 -1.43 0.170 16.6 -1.44 0.276 Welch~
2 df1 2 0.271 2.75 2.48 0.654 0.522 17.8 -0.601 1.14 Welch~
3 df1 3 -0.250 3.12 3.37 -0.544 0.593 17.7 -1.22 0.716 Welch~
4 df2 1 -0.169 0.747 0.916 -0.407 0.690 15.3 -1.05 0.714 Welch~
5 df2 2 0.0259 1.94 1.91 0.0702 0.945 17.9 -0.748 0.800 Welch~
6 df2 3 0.496 3.28 2.79 1.11 0.281 17.5 -0.444 1.44 Welch~
7 df3 1 NA NA NA NA NA NA NA NA NA
8 df3 2 -0.274 1.99 2.26 -0.650 0.525 15.8 -1.17 0.622 Welch~
9 df3 3 0.407 3.34 2.93 0.920 0.371 16.6 -0.529 1.34 Welch~
Because varB for threshold 1 in df3 ist NA
the row 7 in the output is also NA