For Loop t.test, Comparing Means by Factor Class in R

Question

I want to loop a lot of one sided t.tests, comparing mean crop harvest value by pattern for a set of different crops.

My data is structured like this:


df <- data.frame("crop" = rep(c('Beans', 'Corn', 'Potatoes'), 10),
                 "value" = rnorm(n = 30),
                 "pattern" = rep(c("mono", "inter"), 15),
                 stringsAsFactors = TRUE)

I would like the output to provide results from a t.test, comparing mean harvest of each crop by pattern (i.e. compare harvest of mono-cropped potatoes to intercropped potatoes), where the alternative is greater value for the intercropped pattern.

Help!

bschneidr · Answer 1 · 2019-01-11T23:38:57.520

Here's an example using base R.

# Generate example data
df <- data.frame("crop" = rep(c('Beans', 'Corn', 'Potatoes'), 10),
                 "value" = rnorm(n = 30),
                 "pattern" = rep(c("inter", "mono"), 15),
                 stringsAsFactors = TRUE)

# Create a list which will hold the output of the test for each crop
  crops <- unique(df$crop)
  test_output <- vector('list', length = length(crops))
  names(test_output) <- crops

# For each crop, save the output of a one-sided t-test
  for (crop in crops) {
    # Filter the data to include only observations for the particular crop
    crop_data <- df[df$crop == crop,]
    # Save the results of a t-test with a one-sided alternative
    test_output[[crop]] <- t.test(formula = value ~ pattern,
                                  data = crop_data,
                                  alternative = 'greater')
  }

It's important to note that when calling t-test with the formula interface (e.g. y ~ x) and where your independent variable is a factor, then using the setting alternative = 'greater' will test whether the mean in the lower factor level (in the case of your data, "inter") is greater than the mean in the higher factor level (here, that's "mono").

Using `by` could eliminate `unique`, `vector`, `names`, `for`, and `[` lines! — Parfait, Jan 11 '19 at 23:45
That's a great suggestion. I think it would be valuable to add that as an answer to the question. — bschneidr, Jan 11 '19 at 23:48

score 0 · Answer 2 · answered Jan 11 '19 at 23:44

Here's the elegant "tidyverse" approach, which makes use of the tidy function from broom which allows you to store the output of a t-test as a data frame.

Instead of a formal for loop, the group_by and do functions from the dplyr package are used to accomplish the same thing as a for loop.

library(dplyr)
library(broom)

# Generate example data
  df <- data.frame("crop" = rep(c('Beans', 'Corn', 'Potatoes'), 10),
                   "value" = rnorm(n = 30),
                   "pattern" = rep(c("inter", "mono"), 15),
                   stringsAsFactors = TRUE)

# Group the data by crop, and run a t-test for each subset of data.
# Use the tidy function from the broom package
# to capture the t.test output as a data frame

  df %>% 
    group_by(crop) %>% 
    do(tidy(t.test(formula = value ~ pattern,
                   data = .,
                   alternative = 'greater')))

Yes! I was so close to this earlier. Thanks for your help. – A David Jan 11 '19 at 23:50 — A David, Jan 11 '19 at 23:50
Absolutely. Good luck solving your analysis problem! – bschneidr Jan 11 '19 at 23:52 — bschneidr, Jan 11 '19 at 23:52

score 0 · Answer 3 · answered Jan 11 '19 at 23:53

Consider by, object-oriented wrapper to tapply designed to subset a data frame by factor(s) and run operations on subsets:

t_test_list <- by(df, df$crop, function(sub) 
                   t.test(formula = value ~ pattern,
                          data = sub, alternative = 'greater')
                 )

For Loop t.test, Comparing Means by Factor Class in R

3 Answers3