1

I'm trying to make this function work, but am failing. What I need is a function that reads the names from a dataframe columns and uses them to perform a Wilcoxon test on each of those columns. "result" would be the main final product, a table with the genus names and their p-values on each row. I've added also a plotting feature for visualizing the values among groups for each column, that I would save naming them after the corresponding genus.

library("dplyr")
library("ggpubr")
library(PairedData)
library(tidyr)
         
    process <- function(data, genus){
          group_by(data,group) %>%summarise(
              count = n(),
              median = median(genus, na.rm = TRUE),
              IQR = IQR(genus, na.rm = TRUE)
            )
          # Subset data before and after treatment
          T0 <- subset(data,  group == "T0", genus,drop = TRUE)
          T2 <- subset(data,  group == "T2", genus,drop = TRUE)
          #Wilcoxon test for paired data, I want a table of names and corresponding p-values
          res <- wilcox.test(T0, T2, paired = TRUE)
          res$p.value
          result <- spread(genus,res$p.value)
          # Plot paired data, with title depending on the data and its p-value (this last one could be optional)
          pd <- paired(T0, T2)
          tiff(genus".tiff", width = 600, height = 400)
           plot(pd, type = "profile") + labs(title=print(data[,genus]", paired p-value="res[,p.value]) +theme_bw()
          dev.off()
      }
        
        l <- length(my_data)
        glist <- list(colnames(my_data[3:l])) #bacteria start at col 3
        wilcoxon <- process(data = my_data, genus = glist)

A reproducible dataset could be

my_data    
    Patient group   Subdoligranulum Agathobacter
    pt_10T0 T0  0.02    0.00 
    pt_10T2 T2  10.71   19.89 
    pt_15T0 T0  29.97   0.28 
    pt_15T2 T2  16.10   7.70 
    pt_20T0 T0  2.39    0.44 
    pt_20T2 T2  20.48   3.35 
    pt_32T0 T0  12.23   0.17 
    pt_32T2 T2  37.11   1.87 
    pt_36T0 T0  0.64    0.03 
    pt_36T2 T2  0.02    0.08 
    pt_39T0 T0  0.04    0.01 
    pt_39T2 T2  0.36    0.05 
    pt_3t0  T0  13.23   1.34 
    pt_3T2  T2  19.22   1.51 
    pt_9T0  T0  11.69   0.57 
    pt_9T2  T2  34.56   3.52 

I'm not very familiar with functions, and haven't found yet a good tutorial on how to make them from a dataframe... so this is my best attempt, I hope some of you can make it work. Thank you for the help!

camcecc10
  • 47
  • 6

1 Answers1

0

Simply, return the needed value at end of processing. Below does not test the plot step (with unknown packages) but adjusted for proper R grammar:

proc_wilcox <- function(data, genus){
    # Subset data before and after treatment
    T0 <- data[[genus]][data$group == "T0"]
    T2 <- data[[genus]][data$group == "T2"]

    # Wilcoxon test for paired data
    res <- wilcox.test(T0, T2, paired = TRUE)

    # Plot paired data, with title depending on the data and its p-value
    # pd <- paired(T0, T2)
    # tiff(paste0(genus, ".tiff"), width = 600, height = 400)
    # plot(pd, type = "profile") + 
    #   labs(title=paste0(genus, " paired p-value= ", res$p.value)) + 
    #   theme_bw()
    # dev.off()

    return(res$p.value)
}

Then, call the method with an apply function such as sapply or slightly faster vapply designed to process across iterables and return same length.

# VECTOR OF RESULTS (USING sapply)
wilcoxon_results <- sapply(
  names(my_data)[3:ncol(my_data)], 
  function(col) proc_wilcox(my_data, col)
)

# VECTOR OF RESULTS (USING vapply)
wilcoxon_results <- vapply(
  names(my_data)[3:ncol(my_data)], 
  function(col) proc_wilcox(my_data, col),
  numeric(1)
)

wilcoxon_results
# Subdoligranulum    Agathobacter 
#       0.1484375       0.0078125 

wilcoxon_df <- data.frame(wilcoxon_results)
wilcoxon_df
#                 wilcoxon_results
# Subdoligranulum        0.1484375
# Agathobacter           0.0078125
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you so much, your adjustments and the apply part do the trick! Could you please explain this line "function(col) proc_wilcox(my_data, col)", what is "col" and how is it working? – camcecc10 Feb 17 '22 at 10:47
  • In *apply* functions, when you pass an anonymous `function` call, you pass *any* named variable into it. Here `col` was chosen which represents the iterating element from the iterable: `names(my_data)[3:ncol(my_data)]`. – Parfait Feb 17 '22 at 18:40