1

I have a dataframe of 27 columns (26 are numeric variables and the 27th column tells me which group each row is associated with). There are 7 groups in total I'm trying to apply the Kruskal-Wallis test to each variable, split by group, to determine if there is a significant difference or not.

I have tried:

df.groupby(['treatment']).apply(kruskal)

which throws an error "Need at least 2 groups two groups in stats.kruskal()".

My other attempts haven't produced an output either. I'll be doing similar analyses on a regular basis and with larger datasets. Can someone help me understand this issue and how to fix it?

kynnemall
  • 855
  • 9
  • 26

1 Answers1

6

With Scipy, you could do like that for each variable:

scipy.stats.kruskal(*[group["variable"].values for name, group in df.groupby("treatment")])
user9993950
  • 141
  • 4
  • ValueError: Need at least two groups in stats.kruskal() – kynnemall Aug 01 '18 at 13:30
  • How many groups do you have? What is the output if you do `df.groupby("treatment").size()` ? – user9993950 Aug 01 '18 at 14:15
  • There are 5 groups. The output is 134, 72, 128, 59, and 72 for these groups. – kynnemall Aug 01 '18 at 14:58
  • Edited my answer, the iterable needed to be expanded (by putting `*` in front). Ok now? – user9993950 Aug 01 '18 at 15:09
  • Yes, thank you. I would like to apply it to all variables in the dataframe. Is there a simple adjustment I can make to the code that will do this (without running a for loop)? – kynnemall Aug 01 '18 at 15:16
  • I think you cannot bypass a for loop on all variables for that. – user9993950 Aug 01 '18 at 15:25
  • Fair enough. Thanks for all your help. Would you mind breaking down the statement and explaining the statement inside the kruskal function? – kynnemall Aug 01 '18 at 15:26
  • 1
    `for name, group in df.groupby("treatment")` iterates through the different groups and for each group, `group["variable"].values` select the values of the desired column. In the end you get a list of the values for each group for a given variable that you can expand and give to kruskal function. – user9993950 Aug 01 '18 at 15:37
  • Thank you so much for explaining it to me. It really helps me to learn when I know what is going on in a function, then I should be able to fix it if something goes wrong another time – kynnemall Aug 01 '18 at 15:41