0

I have a dataset like this

long_id short_id quants treatment avg_amount
7210 721015 Short 0 2528582.92
7210 721015 Medium 0 1893851.78
7210 721015 Long 0 2274530.74
7210 721015 Short 1 1301169.80
7210 721015 Medium 1 1442934.90
7210 721015 Long 1 1582988.01
7210 721022 Short 0 1569400.78
7210 721022 Medium 0 25463492.9
7210 721022 Long 0 58901706.6
7210 721022 Short 1 81037294.1
7210 721022 Medium 1 1491750.90
7210 721022 Long 1 8721906.01

And so on for multiple IDs. I also have the same version of the dataset in wide format, with the "0" that becomes treatment_0 variable, and "1" that becomes "treatment_1" variable.

I'd like to do a t-test in order to compare the means of avg_amount for both treatment 0/1, but by grouping also with long and short IDs, as well as quants variable.

Here is what i tried :

stat.test <- df %>%
  group_by(short_id, long_id, quants) %>%
  t_test(avg_amount ~ treatment ) %>%
  adjust_pvalue(method = "BH") %>%
  add_significance()

Error : Problem with mutate() column data. ℹ data = map(.data$data, .f, ...). x not enough 'x' observations

df2 <- data_wide %>% 
  group_by(short_id, long_id, quants) %>% 
  do(tidy(t.test(.$treatment_0,
                 .$treatment_1,
                 mu = 0, 
                 alt = "two.sided", 
                 paired = F, 
                 conf.level = 0.99)))

Error in t.test.default(.$treatment_0, .$treatment_1, mu = 0, alt = "two.sided", : not enough 'x' observations

-> Also tried with "paired=T" but the result is the same.

stat.test <- df %>%
+   group_by(short_id,long_id treatment, quants, avg_amount) %>%
+   t_test(avg_amount ~ treatment ) %>%
+   adjust_pvalue(method = "BH") %>%
+   add_significance()

Error : Problem with mutate() column data. ℹ data = map(.data$data, .f, ...). x Can't extract columns that don't exist. x Column treatment doesn't exist.

It's the 1st time I need to run a t-test like that.

katdataecon
  • 185
  • 8
  • Is it possible you do not have every combination of short_id, long_id and quants? Try running this to count the number of members in each group: `data_wide %>% group_by(short_id, long_id, quants, treatment) %>% summarize(counts=n())` – Dave2e Jul 27 '21 at 21:58
  • @Dave2e all my counts are equal to 1 so i suppose that is right ? I found a topic here about these errors, saying that is probably an encoding problem in the variables' names. But i found no problem with the names() and the dput() so i'm lost. – katdataecon Jul 28 '21 at 10:29
  • One point per group is not enough data to perform a t-test. You will have to group with less variables. For example “group_by(short_id, long_id)” should work, but with only 3 points for each treatment it is a low power test. – Dave2e Jul 28 '21 at 11:09
  • 1
    Oh okay now i understand ! It was the grouping by long_id that caused the problem. I grouped with the short_id and quants and i obtained something ! Now i have to figure out how to interpret this big table ! Thanks ! – katdataecon Jul 28 '21 at 13:02

0 Answers0