0

Tomczak and Tomczak's (2014) formula to calculate the eta squared for the Kruskal-Wallis H-test using the following code:

x <- Data$text
H <- unname(kruskal.test(x ~ Data$group)$statistic)
n <- sum(table(x, Data$group))
k <- unname(res$parameter)+1

eta_squared <- (H-k+1)/(n - k)
print(eta_squared)

For reproducibility purposes here is the data:

x <- c(2,2,3,3,3,3,3,4,5,6,6,6,7,7,8,8,9,10,11,11,13,9,10,11,12,19,19,23,26,30,8,14,16,24,26,43,46)


group1 <- rep("group1", 21)
group2 <- rep("group2", 9)
group3 <- rep("group3", 7)


df <- data.frame(group = c(group1, group2, group3), result = c(x))

However, when comparing the findings with the results from the package rstatix, it sometimes gives different results so I am not sure which one I should report. I looked at the source code and I cannot tell what might be the difference. What is the source of the difference?

library(rstatix)
kruskal_effsize(
  Data,
  x ~ group,
  ci = FALSE,
  conf.level = 0.95,
  ci.type = "perc",
  nboot = 1000
)
CatM
  • 284
  • 2
  • 12
  • It would be simpler if you pasted the results `dput(Data)` rather than providing a table of the tabulation. Then we could actually run your code. Also you do not tell us what results differ and by how much. One possibility is that the tests are using different ways of handling tied ranks. Another is that one test estimates the confidence interval by a direct formula (which may or may not adjust for tied ranks) and the other uses bootstrapping which simulates the variation expected in the data and uses the results to estimate the confidence intervals. – dcarlson Sep 01 '20 at 04:27
  • the package gives the result - 0.549, whilst my code gives 0.58. I am not able to share my data using dput because it is sensible data. – CatM Sep 01 '20 at 12:56
  • @dcarlson Just added some reproducible code, hope that helps. – CatM Sep 01 '20 at 13:33

1 Answers1

0

I'm not getting your results. First revising your initial code to use df:

res <- kruskal.test(result~group, df)
H <- unname(res$statistic)
n <- sum(table(df$result, df$group))
k <- unname(res$parameter)+1
(eta_squared <- (H-k+1)/(n - k))
# [1] 0.5812849

Now the other computation:

kruskal_effsize(df, x ~ group, ci = FALSE, conf.level = 0.95,
       ci.type = "perc", nboot = 1000)
# A tibble: 1 x 5
#   .y.       n effsize method  magnitude
# * <chr> <int>   <dbl> <chr>   <ord>    
# 1 x        37   0.581 eta2[H] large 
dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • It may be because the kruskal_effsize in my case is including the NA is the "n" as you can see here: ```# A tibble: 1 x 5 .y. n effsize method magnitude * 1 result 39 0.549 eta2[H] large ``` – CatM Sep 01 '20 at 15:54