t-test on R with multiple grouped variables

Question

I have a dataset like this

long_id	short_id	quants	treatment	avg_amount
7210	721015	Short	0	2528582.92
7210	721015	Medium	0	1893851.78
7210	721015	Long	0	2274530.74
7210	721015	Short	1	1301169.80
7210	721015	Medium	1	1442934.90
7210	721015	Long	1	1582988.01
7210	721022	Short	0	1569400.78
7210	721022	Medium	0	25463492.9
7210	721022	Long	0	58901706.6
7210	721022	Short	1	81037294.1
7210	721022	Medium	1	1491750.90
7210	721022	Long	1	8721906.01

And so on for multiple IDs. I also have the same version of the dataset in wide format, with the "0" that becomes treatment_0 variable, and "1" that becomes "treatment_1" variable.

I'd like to do a t-test in order to compare the means of avg_amount for both treatment 0/1, but by grouping also with long and short IDs, as well as quants variable.

Here is what i tried :

stat.test <- df %>%
  group_by(short_id, long_id, quants) %>%
  t_test(avg_amount ~ treatment ) %>%
  adjust_pvalue(method = "BH") %>%
  add_significance()

Error : Problem with mutate() column data. ℹ data = map(.data$data, .f, ...). x not enough 'x' observations

df2 <- data_wide %>% 
  group_by(short_id, long_id, quants) %>% 
  do(tidy(t.test(.$treatment_0,
                 .$treatment_1,
                 mu = 0, 
                 alt = "two.sided", 
                 paired = F, 
                 conf.level = 0.99)))

Error in t.test.default(.$treatment_0, .$treatment_1, mu = 0, alt = "two.sided", : not enough 'x' observations

-> Also tried with "paired=T" but the result is the same.

stat.test <- df %>%
+   group_by(short_id,long_id treatment, quants, avg_amount) %>%
+   t_test(avg_amount ~ treatment ) %>%
+   adjust_pvalue(method = "BH") %>%
+   add_significance()

Error : Problem with mutate() column data. ℹ data = map(.data$data, .f, ...). x Can't extract columns that don't exist. x Column treatment doesn't exist.

It's the 1st time I need to run a t-test like that.

Is it possible you do not have every combination of short_id, long_id and quants? Try running this to count the number of members in each group: `data_wide %>% group_by(short_id, long_id, quants, treatment) %>% summarize(counts=n())` — Dave2e, Jul 27 '21 at 21:58
@Dave2e all my counts are equal to 1 so i suppose that is right ? I found a topic here about these errors, saying that is probably an encoding problem in the variables' names. But i found no problem with the names() and the dput() so i'm lost. — katdataecon, Jul 28 '21 at 10:29
One point per group is not enough data to perform a t-test. You will have to group with less variables. For example “group_by(short_id, long_id)” should work, but with only 3 points for each treatment it is a low power test. — Dave2e, Jul 28 '21 at 11:09
Oh okay now i understand ! It was the grouping by long_id that caused the problem. I grouped with the short_id and quants and i obtained something ! Now i have to figure out how to interpret this big table ! Thanks ! — katdataecon, Jul 28 '21 at 13:02

t-test on R with multiple grouped variables

0 Answers0