Apply wilcox.test to all paired columns in a dataframe

Question

For my M.Sc. project I try to order all columns of a (by the user) given dataframe by their median, apply the wilcox.test on the columns by a specific schema (mentioned later) and then plot each column's values in a box-whisker-plot.

The ordering and the plotting works just fine, but I have trouble finding a way to apply the wilcox.test to the dataframe in the following schema:

wilcox.test(i, j, paired=TRUE)

whereas i=1 and j=2 and both incrementing until j=ncol(dataframe). So I want to run the function with the parameters column 1 and 2, after that with column 2 and 3 and so on, until j is the last column of the dataframe.

I too want to store all the p-values in a dataframe with one row (containing the p-values) and each row having the name of the two columns that were the parameters in their wilcox.test, because I dont only want to plot all the columns (each representing a "solution"), but I too want to print the p-values for each test in the console (something like: "Wilcoxon-test with 'Solution1' and 'Solution2' resulted in the p-value: 'p-value from wilcox.test of Solution1 and Solution2', which means the solutions are/aren't significatly different").

I tried to adjust some code in other posts concerning this problem, but nothing worked out. Unfortunately I am a very unexperienced in R, too, so I hope that what I wrote above was no utter bullsh*t either. I too tried to iterate over the columns of the dataframe with for-loops and increments in a java-manner, as this is the only programming language I got taught, but that didn't work at all (what a surprise).

The plot my code creates on base of a dataframe with very different values:

Thanks for any advices you guys can give me, it's very much appreciated!

Creating a good reprex will enhance your chance to receive appropriate help from the community: https://stackoverflow.com/help/minimal-reproducible-example — Alessio, Jan 10 '21 at 20:27

Karolis Koncevičius · Accepted Answer · 2021-01-10T20:55:08.353

0

Seems like a job for the matrixTests package. Here is a demonstration using the iris dataset:

library(matrixTests)
col_wilcoxon_twosample(iris[,1:3], iris[,2:4])

             obs.x obs.y obs.tot statistic       pvalue alternative location.null exact corrected
Sepal.Length   150   150     300   22497.5 9.812123e-51   two.sided             0 FALSE      TRUE
Sepal.Width    150   150     300    7793.5 4.151103e-06   two.sided             0 FALSE      TRUE
Petal.Length   150   150     300   19348.5 3.735718e-27   two.sided             0 FALSE      TRUE

The returned results match wilcox.test() done on each pair. For example, 1st vs 2nd columns:

w <- wilcox.test(iris[,1], iris[,2])

w$p.value
[1] 9.812123e-51

edited Jan 10 '21 at 20:55

answered Jan 10 '21 at 20:35

Karolis Koncevičius

9,417
9
56
89

Thanks a lot for the answer! Does this also work with a variable number of columns? So I dont have to hardcode somethig like your "w <- wilcox.test(iris[,1], iris[,2])" but the code doing that with all pairs of columns in the schema that I described. – nostripe Jan 11 '21 at 09:05
Sure, just do `col_wilcox_twosample(dat[,1:(ncol(dat)-1)], dat[,2:ncol(dat)])` – Karolis Koncevičius Jan 11 '21 at 17:42
Awesome, thanks a lot for solving my problem! – nostripe Jan 12 '21 at 14:13

Apply wilcox.test to all paired columns in a dataframe

1 Answers1