1

I want run a series of kruskal.tests, followed by a dunn_test where the Kruskal was significant. Then print the results of the significant dunn_tests.

library(rstatix)
library(purrr)

iris<- iris

for(i in 1:4){
  a<- colnames(iris)
  anova<- map(names(iris)[1:4], ~ kruskal.test(reformulate('Species', response=.x), data=iris))
  post<- dunn_test(iris, as.formula(paste(a[i], a[5], sep="~")), p.adjust.method= "bonferroni")
  
  if(any(post$p.adj < 0.05 & anova[[i]]$p.value < 0.05)){
    print(a)
    print(setNames(post, a))
  }
}

I'm wondering how I can print only the significant dunntests when the kruskal test of the same variable was significant.

I realize this wouldn't give me the result I'm looking for as the post and anova items are two separate objects, this was just my original line of thinking. Using anova[[i]]$p.valuealso results in a subscript out of bounds error.

hugh_man
  • 399
  • 1
  • 6
  • Some clarifications for me... You mention ANOVA in the title and you have `anova` as an object label, but you are not trying to do an ANOVA, just Kruskal-Wallis, correct? And for your expected results, you only want the pairwise comparisons where p.adj < 0.05 not all pairwise comparisons from `post`? – TrainingPizza Feb 02 '22 at 20:11

2 Answers2

2

Here is a way with a function to just get the post-hoc results you want. It works similar to a loop. I see you are familiar with as.formula and I think that’s the best approach here. I’ve used the same formula for the Kruskal-Wallis test and the Dunn test. The function just uses the column names you want to test and creates the formula from that. If you want to return all post-hoc tests if any one comparison is significant, any works with a small change. The output want captures all the post-hoc tests with at least one significant pairwise comparison.

As an aside from your code, it’s not great practice to create object names that have the same name as functions in R. anova is a base function in R, so you wouldn’t want to then have your result named anova as well. You also create the object a over and over in your loop. It’s the same task, so it should be outside your loop. It’s not a huge deal, but if you start working with larger datasets it can add up.

library(rstatix)
#> 
#> Attaching package: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter

iris <- iris

a <- colnames(iris)
a <- a[1:4]

myfunc <- function(y, x){
  form <- as.formula(paste0(y, " ~ ", x))
  res <- rstatix::kruskal_test(form, data = iris)
  
  if (res$p < 0.05){
     post <- rstatix::dunn_test(data = iris, form, p.adjust.method= "bonferroni")
     
     if(any(post$p.adj < 0.05)) return(post)
  }
}

want <- lapply(a, FUN = myfunc, x = "Species")
want <- do.call(rbind, want)

If you only want the specific pairwise comparisons that were significant, because the output from dunn_test is a data.frame, you can just subset it by the adjusted p-values that are less than 0.05 and return that.

myfunc <- function(y, x){
  form <- as.formula(paste0(y, " ~ ", x))
  res <- rstatix::kruskal_test(form, data = iris)
  
  if (res$p < 0.05){
     post <- rstatix::dunn_test(data = iris, form, p.adjust.method= "bonferroni")
     post <- post[post$p.adj < 0.05 ,]
     return(post)
  }
}

As a loop, this would be:

res <- list()
post <- list()
post_sig <- list()
for(i in 1:4){
  form <- as.formula(paste0(a[i], " ~ Species"))
  res[[i]] <- rstatix::kruskal_test(form, data = iris)
  post[[i]] <- rstatix::dunn_test(data = iris, form, p.adjust.method= "bonferroni")
  
  if (res[[i]]$p < 0.05 & any(post[[i]]$p.adj < 0.05)){
       print(post[[i]])
       ## could be blanks in the list here as the number i is used
       ## e.g. if var 2 was not significant, post_sig[[2]] is blank
       post_sig[[i]] <- post[[i]]
  }
}

## if you want results as a dataframe
res <- do.call(rbind, res)
post <- do.call(rbind, post)
post_sig <- do.call(rbind, post_sig)
TrainingPizza
  • 1,090
  • 3
  • 12
  • Thanks so much! My knowledge of using functions and loops is very minimal, let alone using a function with a (y, x) format- so even getting to where my original code was took a long time. This output of this is perfect for what I'm trying to do so I really appreciate your help. – hugh_man Feb 02 '22 at 22:17
  • 1
    @gbalto Glad it was useful. It's definitely challenging to use functions when you're learning, but I think it's worth it for readability and efficiency. Good luck :) – TrainingPizza Feb 02 '22 at 23:00
1

As far as I can tell, your code mostly works. The only major problem is that you misuse the any() function. You misplaced the second parenthesis. Fixing that and rearranging the code a little, I ran this:

library(rstatix)
library(purrr)

iris<- iris

a<- colnames(iris)
anova<- map(names(iris)[1:4], ~ kruskal.test(reformulate('Species', response=.x), data=iris))

for(i in 1:4){
  post<- dunn_test(iris, as.formula(paste(a[i], a[5], sep="~")), p.adjust.method= "bonferroni")
  
  if(any(post$p.adj < 0.05) & anova[[i]]$p.value < 0.05){
    print(a[i])
    print(post)
  }
}

This piece of code prints out one after another, the name of the variable being tested and the output from the dunn_test, so long as the relevant p-value from the kruskal test is less that 0.05, and at least one of the comparisons in the dunn_test is significant.

user3124634
  • 151
  • 7