0

I need to perform mann-whitney test across all genes using R programming. I need to input a text file where first row contains samples, second row contains cohort variables (1 or 2),all other rows contains the gene expressions. This I need to do using function.

The output at the end should be a table with the results: genes in the rows, columns with mean expression of cohort 1, mean expression of cohort 2, FC and mann-whitney p value.

This is what I tried using a demo data but it doesn't seem to be working. I get only G4 as gene in the row and "NAN" without any values in the rows for columns of mean expression of cohort 1, mean expression of cohort 2, FC and mann-whitney p value

data <- read.table(text = "
Cohort  Gene    S1  S2  S3  S4  S5
1   G1  1389    1097    1501    4630    2011
2   G2  1023    880 492 4411    1233
1   G3  2847    2717    2814    4145    5433
2   G4  20612   18123   17679   4099    8567
", header = TRUE)


#separate cohort 1 and 2
cohort1<-data[data$Cohort != "2", 1]
#head(cohort1)
cohort2<-data[data$Cohort != "1", 1]

geneNames <- data$Gene
row.names(data) <- data$Gene

df <- data.frame()

for (Gene in 1:length(geneNames)){
  
  if (sum(cohort1) | sum(cohort2) > 0){
    mwt <- wilcox.test(x = cohort1, y = cohort2, paired = T, exact = F, conf.int = F)
  } else if (sum(cohort1) | sum(cohort2) == 0){
    mwt <- data.frame("p.value" = NA, "conf.int" = NA)
  }
  
  table <- data.frame("Gene" = geneNames[Gene],"Mean_Cohort1" = mean(cohort1),
                      "Mean_Cohort2" = mean(cohort2),"FC" = mean(cohort1)/mean(cohort2), "MW_Pvalue" = mwt$p.value)
  output <- rbind(df, table)
  
  
}

Can anybody help me out with this?

Ankita
  • 25
  • 2
  • 2
    Your description of the data doesn't match the example. Are S1 - S5 samples? What are you testing? You seem to want a paired test; what are the pairs? From the example data it looks like you have different genes measured for cohort 1 and 2. If that's really true, you can't compare them. Besides, is there a reason why you want to use the Mann-Whitney test on raw gene expression values instead of going the more usual route of normalizing and log-transforming the data and doing a t-test? – Cloudberry Nov 24 '22 at 20:06
  • Yes, S1-S5 are samples and yes, the description doesn't match because I tried to do the whole thing by using a different input data to check if it works. But I exactly need to input the file as mentioned in the description. I am new to the mann-whitney test so I am trying to understand how the test works using these demo data, I am not sure if the input data description is good enough to do it. Could you suggest how I can do it if I need to do the paired test? – Ankita Nov 27 '22 at 11:13
  • It's still not clear how your data is structured, particularly how the samples and cohorts relate to each other, and what is the hypothesis you are testing, in other words, what are the cases and controls. – Cloudberry Nov 27 '22 at 17:18

0 Answers0