2

I want to create a loop that stores the output of t-tests for several variables in a data frame. But when I store the different variables in a vector with quotation marks, the variables cannot be used for the t-test as they are saved with the quotation marks. For example, R takes the first variable as "variable_1" in the loop, which produces an error because for the t-test I need the variable without the quotation marks, e.g. t.test(variable_1 ~ Gender). Does someone know how to get rid of the quotation marks of the names of the variables in a vector?

variable <- c("variable_1", "variable_2", "variable_3") 
df <- data.frame(t_value=as.numeric(), 
                 df=as.numeric(),
                 p_value= as.numeric(), 
                 mean_f= as.numeric(),
                 mean_m= as.numeric())
attach(data)
for(v in variable){
  output <- t.test(v ~ Gender)
  values <- output[c(1,2,3,5)]
  row <- round(unlist(values, use.names = FALSE),3)
  df <- rbind(df, row)
}
r2evans
  • 141,215
  • 6
  • 77
  • 149
Anabel
  • 81
  • 7
  • 5
    I know that some classes and tutorials *still* suggest the use of `attach`, but I have yet to find a situation where it added safety or structure; almost all of the time, it creates ambiguities both in the mind of the user and in the language itself. I strongly recommend you to learn how to operate without it. – r2evans Sep 22 '20 at 19:07
  • Gender is missing from the MWE. Ditto re no attach – Richard Careaga Sep 22 '20 at 19:34
  • 1
    To "get rid of the quotation marks" you need non-standard evaluation. A good introduction is given in [advanced R](https://adv-r.hadley.nz/metaprogramming.html) – starja Sep 22 '20 at 20:17
  • Thanks for all the helpful comments! I was not aware that the use of attach comes with drawbacks, but I will definitely stop using it. – Anabel Oct 04 '20 at 18:24

3 Answers3

3

Here is a bit more modern approach with non-standard evaluation and purrr. I've put the logic of your loop into a function that is called for each entry of variable. Inside the function, the value of v - which is a string - is turned into a symbol. This is your variable name. This variable is then evaluated in the context of the provided data.frame for the data argument of t.test.

library(purrr)

variable <- c("variable_1", "variable_2", "variable_3") 

calc_fun <- function(v, input_data) {
  output <- t.test(eval(rlang::sym(v)) ~ Gender, data = input_data)
  values <- output[c(1,2,3,5)]
  values <- round(unlist(values, use.names = FALSE),3)
  data.frame(t_values = values[1],
             df = values[2],
             p_value = values[3],
             mean_f = values[4],
             mean_m = values[5])
}

df <- map_dfr(variable, ~calc_fun(v = .x, input_data = data))

Using @Chuck P's example, my approach looks like this:

df <- map_dfr(variable, ~calc_fun(v = .x, input_data = data))

variable <- c("mpg", "hp")

calc_fun <- function(v, input_data) {
  output <- t.test(eval(rlang::sym(v)) ~ am, data = input_data)
  values <- output[c(1,2,3,5)]
  values <- round(unlist(values, use.names = FALSE),3)
  data.frame(t_values = values[1],
             df = values[2],
             p_value = values[3],
             mean_f = values[4],
             mean_m = values[5])
}

df <- map_dfr(variable, ~calc_fun(v = .x, input_data = mtcars))
df
  t_values     df p_value  mean_f  mean_m
1   -3.767 18.332   0.001  17.147  24.392
2    1.266 18.715   0.221 160.263 126.846
starja
  • 9,887
  • 1
  • 13
  • 28
1

Here's some changes that will make it work via get. As others have pointed out attach is a terrible idea in this context. So I've used mtcars as an example and left it out.

Several other changes to make things as good as they can be. You'd be much better served searching stack for the vast number of answers on run a t-test on multiple variables though or just using @starja or @r2evans answer.

variable <- c("mpg", "hp") 
df <- data.frame(t_value=as.numeric(), 
                 df=as.numeric(),
                 p_value= as.numeric(), 
                 mean_f= as.numeric(),
                 mean_m= as.numeric())

for(v in variable){
   output <- t.test(get(v) ~ am, data = mtcars)
   values <- output[c(1,2,3,5)]
   row <- round(unlist(values, use.names = FALSE), 3)
   df_row <- data.frame(t_value=row[[1]],
                        df=row[[2]],
                        p_value= row[[3]],
                        mean_f= row[[4]],
                        mean_m= row[[5]])

   df <- rbind(df, df_row)
}
df
#>   t_value     df p_value  mean_f  mean_m
#> 1  -3.767 18.332   0.001  17.147  24.392
#> 2   1.266 18.715   0.221 160.263 126.846
Chuck P
  • 3,862
  • 3
  • 9
  • 20
1

If you need to compare one variable against all (or some) other variables in a frame, then something like this:

vars <- c("cyl", "disp", "hp", "gear")
do.call(
  rbind.data.frame,
  lapply(setNames(nm = vars), function(nm) {
    out <- t.test(mtcars[["mpg"]], mtcars[[nm]])
    c(out[c(1, 2, 3)], out[[5]])
  })
)
#      statistic parameter      p.value mean.of.x mean.of.y
# cyl   12.51163  36.40239 9.507708e-15  20.09062    6.1875
# disp  -9.60236  31.14661 7.978234e-11  20.09062  230.7219
# hp   -10.40489  31.47905 1.030354e-11  20.09062  146.6875
# gear  15.28179  31.92893 3.077106e-16  20.09062    3.6875

If you need to compare various pairs (not just all against one), then perhaps something like

vars <- c("mpg", "cyl", "disp", "hp", "gear")
eg <- expand.grid(vars, vars, stringsAsFactors = FALSE)
eg <- eg[ eg[,1] != eg[,2], ]
head(eg)
#   Var1 Var2
# 2  cyl  mpg
# 3 disp  mpg
# 4   hp  mpg
# 5 gear  mpg
# 6  mpg  cyl
# 8 disp  cyl

ret <- do.call(
  rbind.data.frame,
  Map(function(x, y) {
    out <- t.test(x, y)
    c(out[c(1, 2, 3)], out[[5]])
  }, mtcars[eg[,1]], mtcars[eg[,2]])
)
ret <- cbind(eg, ret)
head(ret)
#   Var1 Var2 statistic parameter      p.value mean.of.x mean.of.y
# 2  cyl  mpg -12.51163  36.40239 9.507708e-15   6.18750  20.09062
# 3 disp  mpg   9.60236  31.14661 7.978234e-11 230.72188  20.09062
# 4   hp  mpg  10.40489  31.47905 1.030354e-11 146.68750  20.09062
# 5 gear  mpg -15.28179  31.92893 3.077106e-16   3.68750  20.09062
# 6  mpg  cyl  12.51163  36.40239 9.507708e-15  20.09062   6.18750
# 8 disp  cyl  10.24721  31.01287 1.774454e-11 230.72188   6.18750

---

Note:

1. Iteratively build a frame row-by-row works fine logically and in small doses, but in the long run it performs very poorly: it makes a complete copy of the whole frame with each row, which is memory-inefficient (and slow).

2. The use of `attach` is discouraged, as I said in my comment. Also, `get` should be avoided as well, though perhaps to a lesser degree than `attach`.
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you for your comment! Unfortunately, something went wrong when I tried your first example. I did not get an error message, but the output was not correct (e.g. the mean for y was the same for all variables, although I wanted to calculate the mean for males). I have applied your suggestion to my data as follows: do.call( rbind.data.frame, lapply(setNames(nm = vars), function(nm) { out <- t.test(data[nm], data["Gender"]) c(out[c(1, 2, 3)], out[[5]]) }) ) – Anabel Oct 04 '20 at 19:15