I have a list of 282 items that has been classified by 6 independent coders into 20 categories.
The 20 categories are defined by words (example "perceptual", "evaluation" etc).
The 6 coders have different status: 3 of them are experts, 3 are novices.
I calculated all the kappas (and alphas) between each pair of coders, and the overall kappas among the 6 coders, and the kappas between the 3 experts and between the 3 novices.
Now I would like to check whether there is a significant difference between the interrater agreements achieved by the experts vs those achieved by the novices (whose kappa is indeed lower).
How would you approach this question and report the results?
thanks!