0

Can anyone give me a hint on how to run the Kruskal-Wallis Test below?

My objective : Is there any significance of the growth (agg_rel_abund) of bacteria between Forest and Urban for each family.

The code I have tried in R : kruskal.test(Habitat ~ agg_rel_abund, data = my_data) but obviously I know that is wrong... because I didn't hit my objective..

Let me briefly explain about my data :

There are types of sample, which is F and W.

When the sample name start with F, it means the Habitat is from Urban.

When the sample name start with W, it means the Habitat is from Forest.

It is okay if want to perform Mann-Whitey Test, or any Non-Parametric Test too... as long as can get to know the significance of the growth (agg_rel_abund) of bacteria between Forest and Urban for each family.

Sample Habitat Family agg_rel_abund
F10 Urban Acetobacteraceae 0
F2 Urban Acetobacteraceae 0
F3 Urban Acetobacteraceae 0
F7 Urban Acetobacteraceae 0.000132118
F8 Urban Acetobacteraceae 0
W10 Forest Acetobacteraceae 0
W13 Forest Acetobacteraceae 0
W3 Forest Acetobacteraceae 0
W6 Forest Acetobacteraceae 0
W9 Forest Acetobacteraceae 0
F10 Urban Bacillaceae 0.00488836
F2 Urban Bacillaceae 0.000924825
F3 Urban Bacillaceae 0.001056943
F7 Urban Bacillaceae 0.002378121
F8 Urban Bacillaceae 0.002906593
W10 Forest Bacillaceae 0.000264236
W13 Forest Bacillaceae 0.027876866
W3 Forest Bacillaceae 0.001585414
W6 Forest Bacillaceae 0.001056943
W9 Forest Bacillaceae 0.004492007
F10 Urban Carnobacteriaceae 0
F2 Urban Carnobacteriaceae 0
F3 Urban Carnobacteriaceae 0
F7 Urban Carnobacteriaceae 0
F8 Urban Carnobacteriaceae 0.000132118
W10 Forest Carnobacteriaceae 0
W13 Forest Carnobacteriaceae 0
W3 Forest Carnobacteriaceae 0.000132118
W6 Forest Carnobacteriaceae 0
Phil
  • 7,287
  • 3
  • 36
  • 66
Gambit
  • 77
  • 1
  • 11
  • kruskal.test(agg_rel_abund ~ Habitat, data = my_data) The dependent variable should be placed before ~. It should be like, Dep.var ~Independent.var – Mohanasundaram Jul 08 '21 at 13:50
  • @Mohanasundaram oh! The code is that simple only? OMG... – Gambit Jul 08 '21 at 13:53
  • @Mohanasundaram , I don't need to run the test for every type of bacteria to get the p-value right? Just wondering... I should just use kruskal.test(agg_rel_abund ~ Habitat, data = my_data) right? – Gambit Jul 08 '21 at 13:58

1 Answers1

1

This question should be in cross-validated.

If you want to know whether the the growth is varying with Family, irrespective of the Habitat, you can perform kruskal.test with agg_rel_abund as dependent variable and Family as independent variable.

kruskal.test(agg_rel_abund ~ Habitat, data = my_data)

Kruskal-Wallis rank sum test

data:  agg_rel_abund by Habitat
Kruskal-Wallis chi-squared = 0.0051556, df = 1, p-value = 0.9428

If you are sure that there is no difference in growth across different families, you can directly perform kruskal.test with agg_rel_abund as dependent variable and Habitat as independent variable.

kruskal.test(agg_rel_abund ~ Habitat, data = my_data)

Kruskal-Wallis rank sum test

data:  agg_rel_abund by Habitat
Kruskal-Wallis chi-squared = 0.0051556, df = 1, p-value = 0.9428

For each habitat, you can perform kruskal.test to check the significant of difference in growth among families

library(dplyr)

    for (i in unique(family$Habitat)) {
  x <- kruskal.test(agg_rel_abund ~ family,
                    data = family[family$Habitat==i,])
  out[[i]] <- c(Kruskal.Wallis.H = x[["statistic"]][["Kruskal-Wallis chi-squared"]],
                Sig = x[["p.value"]],
                df = x[["parameter"]][["df"]])
  }

out <- bind_rows(out)
out$Habitat <- unique(family$Habitat)
Mohanasundaram
  • 2,889
  • 1
  • 8
  • 18