I am fairly new to R and I am trying to run a kruskal wallis test to see if there is a difference between three groups when looking at different genes. I have 3 groups and 127 proteins. I have been able to create a code that will do this,
sample_data"
groups <- c("control","control","control","control","control","group1","group1","group1","group1","group1","group1","group1","group1","group1","group1","group1","group1","group1","group2","group2","group2","group2","group2","group2","group2","group2")
gene1 <- c(8,7,4,5,0,2,8,5,6,4,4,6,5,4,6,4,7,4,8,1,6,3,5,6,3,1)
gene2 <- c(8,10,10,9,7,5,8,10,8,9,10,9,6,9,8,7,8,7,8,9,9,7,7,6,9,8)
gene3 <- c(10,11,10,11,5,6,9,11,10,11,12,8,4,7,7,10,10,3,2,11,9,10,9,3,10,10)
gene4 <- c(4,4,3,2,0,2,4,4,3,3,4,1,1,1,4,4,3,2,3,4,4,1,4,3,2,2)
gene5 <- c(8,10,11,10,7,6,8,8,8,12,11,8,7,8,8,10,10,9,10,8,10,7,8,7,10,7)
mydata <- data.frame(groups,gene1,gene2,gene3,gene4,gene5)
i <- 2 #ignore 1st column as this is not a "protein"
pval <-NULL
repeat{
K <- kruskal.test(df[,i], df[,1], data = df, paired=FALSE, p.adjust.methods="none")
pval <- c(as.matrix(sapply(K[3],as.numeric)),pval)
i <- i+1
if(i>ncol(df)){break}
}
unfortunately the pvalue obtained is different than what I get doing a kruskal wallis test on just one gene at a time. For example:
For Gene1, the pvalue obtained from the loop was 0.0389 but when I run kruskal.test(Gene1,group, data=df) I get a pvalue of 0.84.
I came across this because after doing the kruskal wallist test I proceeded with a pairwise Mann Whitney test and noticed that the "significant" pvalues for Kruskal wallis did not correlate with the "significant" pvalues for Mann Whitney.
Furthermore, I went on VassarStats and minitab and got a p-value of 0.84(adjustment for ties). I would like to know how I can run this Kruskal wallis test in a loop without the p-values being affected. Is there something I am not seeing that I am doing incorrectly?
Also, I have used getAnywhere(kruskal.test.default) that I saw in a previous post, but I can't find what would cause this to occur when performing the test over and over.