0

I have an issue with performing a t-test over all columns of my dataframe.

  1. What I want to do? Each column represents a KPI of a certain research question. Moreover each column can be divided into two groups target = 1 and non-target = 0, defined in the "target" column. I want to perform a t-test per column between those two groups and save the p-value and t-value in a separate dataframe.

  2. Below you can find a code which would work, there would be no error message. However I would need to type the code below 70 times column by column and always change the column name (below "X1y_ret"), which is not convenient as I have those 70 columns in my dataframe.

    t.test(X1y_ret ~ target, data = analysis_matched_targets_all, var.equal = FALSE)$p.value

  3. That's why I tried to use a for loop to make it more convenient. You can find the code below. However it doesn't work, giving me an error message of different variable lengths (found for 'target').

for(i in colnames(analysis_matched_targets_all) { 
    statistics_all_types_all_year[3, which(colnames(analysis_matched_targets_all)==i )] <- t.test(i ~ target, data = analysis_matched_targets_all, var.equal = FALSE)$p.value}

I would really appreciate if you could help me with my problem :)

user438383
  • 5,716
  • 8
  • 28
  • 43
  • Try the formula `get(i) ~ target`. As is `i` is a character string, whether it holds the name of a data set column or any other string is irrelevant. – Rui Barradas Aug 31 '22 at 10:30

1 Answers1

0

I've focused my answer on your part 2 that seems to be at the heart of what you wish you knew. How to iterate , apply a function many times, over different input parameters.


#invent a dataset, not with 70 vars to t.test, but with 2

(sleep_with_more_vars <- mutate(sleep,
                               anothervar=rev(extra)))

# show how we would by hand do each of the 2 
t.test(extra  ~ group, data = sleep_with_more_vars, var.equal = FALSE)$p.value
t.test(anothervar  ~ group, data = sleep_with_more_vars, var.equal = FALSE)$p.value

# show how we would do them programatically, just by naming them
vars_to_do <- c("extra",
                "anothervar")

(results <- lapply(vars_to_do, function(x)t.test(as.formula(paste0(x ," ~ group")),
                                          data = sleep_with_more_vars, var.equal = FALSE)$p.value))

names(results) <- vars_to_do

results
Nir Graham
  • 2,567
  • 2
  • 6
  • 10