0

I want to loop a lot of one sided t.tests, comparing mean crop harvest value by pattern for a set of different crops.

My data is structured like this:


df <- data.frame("crop" = rep(c('Beans', 'Corn', 'Potatoes'), 10),
                 "value" = rnorm(n = 30),
                 "pattern" = rep(c("mono", "inter"), 15),
                 stringsAsFactors = TRUE)

I would like the output to provide results from a t.test, comparing mean harvest of each crop by pattern (i.e. compare harvest of mono-cropped potatoes to intercropped potatoes), where the alternative is greater value for the intercropped pattern.

Help!

bschneidr
  • 6,014
  • 1
  • 37
  • 52
A David
  • 48
  • 7

3 Answers3

0

Here's an example using base R.

# Generate example data
df <- data.frame("crop" = rep(c('Beans', 'Corn', 'Potatoes'), 10),
                 "value" = rnorm(n = 30),
                 "pattern" = rep(c("inter", "mono"), 15),
                 stringsAsFactors = TRUE)

# Create a list which will hold the output of the test for each crop
  crops <- unique(df$crop)
  test_output <- vector('list', length = length(crops))
  names(test_output) <- crops

# For each crop, save the output of a one-sided t-test
  for (crop in crops) {
    # Filter the data to include only observations for the particular crop
    crop_data <- df[df$crop == crop,]
    # Save the results of a t-test with a one-sided alternative
    test_output[[crop]] <- t.test(formula = value ~ pattern,
                                  data = crop_data,
                                  alternative = 'greater')
  }

It's important to note that when calling t-test with the formula interface (e.g. y ~ x) and where your independent variable is a factor, then using the setting alternative = 'greater' will test whether the mean in the lower factor level (in the case of your data, "inter") is greater than the mean in the higher factor level (here, that's "mono").

bschneidr
  • 6,014
  • 1
  • 37
  • 52
0

Here's the elegant "tidyverse" approach, which makes use of the tidy function from broom which allows you to store the output of a t-test as a data frame.

Instead of a formal for loop, the group_by and do functions from the dplyr package are used to accomplish the same thing as a for loop.

library(dplyr)
library(broom)

# Generate example data
  df <- data.frame("crop" = rep(c('Beans', 'Corn', 'Potatoes'), 10),
                   "value" = rnorm(n = 30),
                   "pattern" = rep(c("inter", "mono"), 15),
                   stringsAsFactors = TRUE)

# Group the data by crop, and run a t-test for each subset of data.
# Use the tidy function from the broom package
# to capture the t.test output as a data frame

  df %>% 
    group_by(crop) %>% 
    do(tidy(t.test(formula = value ~ pattern,
                   data = .,
                   alternative = 'greater')))
bschneidr
  • 6,014
  • 1
  • 37
  • 52
0

Consider by, object-oriented wrapper to tapply designed to subset a data frame by factor(s) and run operations on subsets:

t_test_list <- by(df, df$crop, function(sub) 
                   t.test(formula = value ~ pattern,
                          data = sub, alternative = 'greater')
                 )
Parfait
  • 104,375
  • 17
  • 94
  • 125