0

I need to compare the value in every four rows to one value (as mu) using a Wilcoxon signed rank sum test. For example if my data looks like this:

df1 <- c(0.205346764819837, 0.260927758796802, 0.243880102849495, 0.244549329012715, 
      0.122609277587968, 0.19381141911169, 0.0617801415941672, 0.217762671269064, 
      0.0513190799901377, 0.293455672572294, 0.222447254411609, 0.271001373674756, 
      0.00119756260786869, 0.119069423408827, -0.0164312634285513, 
      0.0446268183579303)

     
df2 <- c(0.23340509, 0.05959987, 0.17380963, 0.14517836)

I am using a wilcox.test to compare each of the four values from df1 with one value as mu from df_stack2. Considering a df with just the first four rows it would be

wilcox.test(dfnew$A, mu=0.23340509)$p.value. 

I realise I could group every four rows through using:

split(df, as.integer(gl(nrow(df) 4, nrow(df))))

I was hoping to adopt this for use in a mapply (so I could parallelise with future.apply due to the actual size of my dataframe), however, I am a little unsure as to how I could specify every four rows being compared to one value (in a separate dataframe) as mu?

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89

3 Answers3

0

You could create a list of 4 elements where each element of the list contains every four values of df1. Then using map2() from purrr package or tidyverse() family you could provide for each chunck of splitted list the corresponding mu from df2:

library(tidyverse)

# split df1 into 4 chunks of size 4 (or 4 elements of size 4)

      df1split <- split(df1, ceiling(seq_along(df1) / 4))

#calculate wilcoxon test for each chunk using map2()

      map2(df1split,df2, ~ wilcox.test(.x, mu =.y)$p.value) 

$`1`
[1] 0.875       # pvalue of first chunk 

$`2`
[1] 0.125       # pvalue of second chunk

$`3`
[1] 0.875       # pvalue of third chunk

$`4`
[1] 0.125       # pvalue of fourth chunk

To return p-values from list into a data frame you can add the following codes to map2():

map2(df1split,df2, ~ wilcox.test(.x, mu =.y)$p.value) %>% 
     bind_rows() %>% t() %>% as.data.frame() %>% 
     rename("p-value" = "V1")

    
  p-value
1  0.875
2  0.125
3  0.875
4  0.125
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14
0

You can create your group using rep() and apply your function by group:

library(data.table)
setDT(dfnew)[, grp:=rep(1:(.N/4), each=4, length.out=.N)]
dfnew[, .(pval = wilcox.test(A, mu=df2[.BY$grp])$p.value), grp]

Output:

     grp  pval
   <int> <num>
1:     1 0.875
2:     2 0.125
3:     3 0.875
4:     4 0.125

Similarly, using dplyr:

dfnew %>% 
  group_by(grp = rep(1:(n()/4), each=4, length.out=n())) %>% 
  summarize(pval = wilcox.test(A,mu = df2[cur_group()$grp])$p.value)

Output:

    grp  pval
  <int> <dbl>
1     1 0.875
2     2 0.125
3     3 0.875
4     4 0.125

There is another approach that you might find interesting:

setDT(dfnew)[, .(pval = wilcox.test(A, mu=.BY$mu)$p.value), .(mu = rep(df2, each=4))]

Output:

           mu  pval
        <num> <num>
1: 0.23340509 0.875
2: 0.05959987 0.125
3: 0.17380963 0.875
4: 0.14517836 0.125

langtang
  • 22,248
  • 1
  • 12
  • 27
0

Another approach with another library:

library(matrixTests)

# turn each 4-observation entry into a separate row in a matrix:
m1 <- matrix(df1, ncol=4, byrow=TRUE)

# perform the test:
row_wilcoxon_onesample(m1, df2)
  obs statistic pvalue location.null alternative exact corrected
1   4         6  0.875    0.23340509   two.sided  TRUE     FALSE
2   4        10  0.125    0.05959987   two.sided  TRUE     FALSE
3   4         6  0.875    0.17380963   two.sided  TRUE     FALSE
4   4         0  0.125    0.14517836   two.sided  TRUE     FALSE
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89