I'm getting crazy over this issue:
I'm computing (lots of) ICCs at the moment to assess the retest-reliability of brain-parameters using the function 'icc' in the package 'irr' by Matthias Gamer. However, I noticed several unrealistic p-values in the ouput. For example, a moderate ICC (two variables, no missing values) of .675 in a large sample of N=583 comes with a non-significant p-value of .102, which seems highly unlikely given the moderate effect and the large sample (I caculate thousands of ICCs atm and usually obtain p-values <.001 for ICCs > .200). To be sure, I reproduced the EXACT same ICC using a different software, and obtained the expected p-value <.001 here. However, I would like to use the irr-package for an automated, code-based analysis. This is the code I'm using:
library(irr)
load(Example_dataframe)
icc(Example_dataframe, model = "twoway", type = "agreement", unit = "average")
And this is the complete output I get:
Average Score Intraclass Correlation
Model: twoway
Type : agreement
Subjects = 583
Raters = 2
ICC(A,2) = 0.675
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(582,2) = 9.27 , p = 0.102
95%-Confidence Interval for ICC Population Values:
-0.237 < ICC < 0.893
I noticed that the confidence interval is super large, but an effect of .675 should still be highly significant. It seems that there is a bug in the icc-function, which is confusing as the package is quite popular and has been used for more than a decade. The variables I'm using have no missing values. I tried to contact the maintainer, but received no response.
My questions:
Is there anyone with experience with the 'irr'-package who can help me? Do you agree that this must be based on a bug? What else could be the problem?
Any help is highly appreciated, I'm running out of ideas what to do here...
Kind regards, Tobias