0

So I have this Data and trying to do kruskal.test() over a list containing dataframes

df_list <- list(
  `1.3.A` = 
    tibble::tribble(
      ~Person, ~Height, ~Weight,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L

    ),
  `2.2.A` = 
    tibble::tribble(
      ~Person, ~Height, ~Weight,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L,
       "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L
    ), 
  `1.1.B` = 
    tibble::tribble(
      ~Person, ~Height, ~Weight,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L
    )
)

I am trying to perform kruskal.test over these 3 dataframes but failed after hours and hours of trying to find a solution. I am new to R.

Failed attempts are :

snake <- function(i){
  kruskal.test(df$Height ~ df$Person, data = i)
}
snail <- lapply(df_list, "[[", snake)


df_list %>% kruskal.test(df$Height ~ df$Person)

sapply(df_list, function(i) { kruskal.test(df$Height ~ df$Person, data = i)})


Map(function(x) kruskal.test(Height ~ Person), get(df_list))

Map(function(df_list, .f(kruskal.test(Height ~ Person)))

lapply(mget(df_list), function(x) kruskal.test(Height ~ Person))

bunny <- df_list %>%
  kruskal_test(df$Height ~ Person, data = .)

Summary: I am trying to do kruskal.test() over a set of list containing dataframes. How can a pass a formula over lapply() or Map() to run the kruskal.test() in each dataframes in the list?

halfer
  • 19,824
  • 17
  • 99
  • 186

1 Answers1

2

Your code is referencing an object called "df", which does not appear to exist. Also, when using kruskal.test with the arguments kruskal.test(formula, data), there is no need to reference the data frame in the formula. Providing kruskal.test a "data" argument will cause the function to search for the formula symbols first in the provided data. In other words, if data frame "x" contains columns "Height" and "Person", then the following would work:

kruskal.test(Height ~ Person, data = x)

In your example, you shouldn't reference df. Notice that the code below creates a temporary function with an argument called "i", and that "i" is subsequently referenced:

lapply(df_list, function(i) kruskal.test(Height ~ Person, data = i))

$`1.3.A`

    Kruskal-Wallis rank sum test

data:  Height by Person
Kruskal-Wallis chi-squared = 5, df = 2, p-value = 0.08208


$`2.2.A`

    Kruskal-Wallis rank sum test

data:  Height by Person
Kruskal-Wallis chi-squared = 5, df = 2, p-value = 0.08208


$`1.1.B`

    Kruskal-Wallis rank sum test

data:  Height by Person
Kruskal-Wallis chi-squared = 5, df = 2, p-value = 0.08208
jdobres
  • 11,339
  • 1
  • 17
  • 37
  • 2
    Purrr/Tidyverse version: `map(df_list, ~kruskal.test(Height ~ Person, data = .)) ` – andrew_reece Aug 21 '21 at 17:51
  • @jdobres Thank you very much!! You are a life saver!! In my head, I was referencing the list. Thank you very for the explanation. Right now, I am having a hard time when or when not to reference a particular argument in a given function in R. – Tyler Ruddenfort Aug 21 '21 at 18:00
  • 1
    @andrew_reece, Thank you for this solution as well. :) – Tyler Ruddenfort Aug 21 '21 at 18:04