1

This is most likely a very simple question but I'll ask it nevertheless since I haven't found an answer. How can I compare the amount of "cases" (for example flu) in two groups i.e. find out if the difference between the amounts of cases in the groups is statistically significant? Can I apply some sort of t-test? Or is it even meaningful to do this kind of a comparison?

I'd preferably do the comparison in R.

A very simple data example:

group1 <- 1000 # size of group 1
group2 <- 1000 # size of group 2

group1_cases <- 550 # the amount of cases in group 1
group2_cases <- 70 # the amount of cases in group 2
Pinus
  • 17
  • 4

1 Answers1

0

I think a chisq.test is what you are looking for.

group1 <- 1000 # size of group 1
group2 <- 1000 # size of group 2

group1_cases <- 550 # the amount of cases in group 1
group2_cases <- 70 # the amount of cases in group 2

group1_noncases <- 1000 - group1_cases
group2_noncases <- 1000 - group2_cases


M <- as.table(rbind(c(group1_cases, group1_noncases),
                    c(group2_cases, group2_noncases)))

dimnames(M) <- list(groups = c("1", "2"),
                    cases = c("yes","no"))

res <- chisq.test(M)

# The Null, that the two groups are equal, has to be rejected:

res
#> 
#>  Pearson's Chi-squared test with Yates' continuity correction
#> 
#> data:  M
#> X-squared = 536.33, df = 1, p-value < 2.2e-16

# if both groups were equal then this would be the expected values:

res$expected
#>       cases
#> groups yes  no
#>      1 310 690
#>      2 310 690

Created on 2021-04-28 by the reprex package (v0.3.0)

Statistically a t.test would not be the correct method. However, people use it for this kind of test and in most cases the p values are very simillar.

# t test
dat <- data.frame(groups = c(rep("1", 1000), rep("2", 1000)),
       values = c(rep(1, group1_cases),
                  rep(0, group1_noncases),
                  rep(1, group2_cases),
                  rep(0, group2_noncases)))

t.test(dat$values ~ dat$groups)

#> 
#>  Welch Two Sample t-test
#> 
#> data:  dat$values by dat$groups
#> t = 27.135, df = 1490.5, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  0.4453013 0.5146987
#> sample estimates:
#> mean in group 1 mean in group 2 
#>            0.55            0.07

Created on 2021-04-28 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39