0

Are you please able to assist in performing a Krustal Wallis test using a subset of my data? I would like to be able to test for differences in "N" between "Producers".

names(Isotope.Data)
[1] "Species"         "Name"            "Group"           "Simple_Group"       "Trophic_Group"  
[6] "Sample"          "N"               "C" 

In my csv.file I have a column "Trophic Group" which separates Consumers and Producers.

table(Isotope.Data$Trophic_Group)

Consumer Producers  
    61         18 

Under the column heading Simple_Group, I have three Producers - Rhodophyta, Seagrass and Phaeophyceae

table(Isotope.Data$Simple_Group)

 Abalone  Loliginidae      Octopus Phaeophyceae   Rhodophyta     Seagrass      Teleost 
      24            2           12            6            9            3           20 
Tunicate 
       3 

I have tried numerous things, but I get various error messages. Would anyone be able to improve on the following code?

kruskal.test(C ~ Simple_Group, data = Isotope.Data, subset = Isotope.Data$Trophic_Group = "Producers") 

P.S. I have created a separate CSV.file which only includes Primary Producers. However a subsequent Dunn-test of multiple comparisons, used to determine which levels differed from each other provides different significance levels to those which includes both Consumers and Producers.

Greeny
  • 7
  • 6
  • I have several questions: What is C when you call `kruskal.test`? Which is the error message you get when running the code? – R18 Sep 06 '17 at 11:33
  • C refers to Carbon, and N refers to Nitrogen. I will run separate tests to test for differences in C and N between consumers and producers – Greeny Sep 06 '17 at 11:41
  • The error is: Error: unexpected '=' in "kruskal.test(C ~ Simple_Group, data = Isotope.Data, subset = Isotope.Data$Trophic_Group =" – Greeny Sep 06 '17 at 11:42
  • You need to use `==` and not `=`. – Roman Luštrik Sep 06 '17 at 11:46
  • Thanks Roman, I have tried that also. I get the following error.... Error in kruskal.test.default(numeric(0), integer(0)) : all observations are in the same group – Greeny Sep 06 '17 at 11:47
  • To make comparisons you need individuals from two different groups, and it seems like you have all of them from only one group. – R18 Sep 06 '17 at 11:53
  • I have three different groups: Phaeophyceae, Rhodophyta and Seagrass. I would like to see whether there is a significant difference in C or N between these three producers. – Greeny Sep 06 '17 at 12:30

2 Answers2

2

Will maybe this answer be helpful? Based on @user295691 answer:

Kruskal-Wallis test: create lapply function to subset data.frame?

Here you identify individual groups what you want to test differences between, and use split, to correctly define subsetting of your data frame.

Dummy example:

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)

df<-data.frame(val, distance, phase)

# get unique groups
ii<-unique(df$phase)

# run Kruskal test, specify the subset
kruskal.test(df$val ~df$distance,
             subset = phase == "c")

And now apply the kruskal.test to each group using split:

lapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })

or create a function:

lapply(ii, function(i) { kruskal.test(df$val ~ df$distance, subset=df$phase==i )})

Both produces test results for each group:

[[1]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.14881, df = 1, p-value = 0.6997


[[2]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.11688, df = 1, p-value = 0.7324


[[3]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.0059524, df = 1, p-value = 0.9385

Or just get the p-values (notice the addition of $p.value after the kruskal.test):

lapply(ii, function(i) { 
  kruskal.test(df$val ~ df$distance, 
               subset=df$phase==i )$p.value
}
  )
maycca
  • 3,848
  • 5
  • 36
  • 67
1

You can also use the map() function from the package purrr to apply function in each group once splited

library(purrr)
test <- df %>% group_split(phase) %>% map(~kruskal.test(.,val ~ distance))
test
RadRel
  • 23
  • 4