1

I'm looking for help on performing the Kruskal-Wallis test on my set of data for a large number of factors. I can perform the test for a single factor, like AD_1yr:

kruskal.test(Shannon ~ AD_1y, data=comm)

But I have over 50 factors I want to test, and was hoping there is a code I can enter that will perform the test for all the factors without having to manually perform the test 50 different times.

user2988430
  • 73
  • 2
  • 6
  • 2
    N.B. you may want to consider some type of correction calculation since you are making [multiple comparisons](https://en.wikipedia.org/wiki/Multiple_comparisons_problem). – JasonAizkalns Jan 29 '16 at 18:16

1 Answers1

2

We can use lapply to loop over the factor columns, create a data.frame with the 'shannon' column and do the kruskal.test

allfactorcolumns <- sapply(comm, is.factor)
lst <- lapply(comm[allfactorcolumns], function(x) 
    kruskal.test(Shannon~., data= data.frame(x, comm['Shannon'])))

If we need to extract the 'p.value', 'df', etc.

do.call(rbind, lapply(lst, function(x) data.frame(Pval= x$p.value, 
                     stat= x$statistic, df= x$parameter)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @user2988430 Thanks for the feedback. I forgot a closing `)` – akrun Jan 29 '16 at 18:25
  • OK got it to run but it looks like it ran 1 test against all the factors. _data: Shannon by x Kruskal-Wallis chi-squared = 79, df = 79, p-value = 0.4788_ I would like it to run 1 test against each factor separately. – user2988430 Jan 29 '16 at 19:01
  • @user2988430 Actually, it is running each factor with 'Shannon' separately. – akrun Jan 29 '16 at 19:07
  • Hmm OK. I wanted to get an output where it showed the resulting chi-square, df, and p-value for _each_ factor. Is this possible at all? – user2988430 Jan 29 '16 at 19:11
  • @akun Got an **Error in lapply(lst, function(x) data.frame(Pval = x$p.value, stat = x$statistic, : object 'lst' not found** – user2988430 Jan 29 '16 at 19:21
  • @user2988430 I created the `lst <- lapply(comm[allfactorcolumns], ...` (if you haven't noticed) – akrun Jan 29 '16 at 19:25
  • I think it would be more efficient to create multiple formulas for each model on the same data than to create multiple data frames for each model. – fishtank Jan 30 '16 at 09:13