Cannot compute exact p-value with ties while using boxplots

Question

I have multiple cancer datasets, with genes as rows and samples as columns. Each data set has samples that responded to the therapy, labeled Response, and samples that did not respond, labeled NoResponse.

I'm trying to assess the difference between those two groups in each dataset separately. One way of doing that is by checking the difference in the expression of important genes. I'm doing that by using boxplots with facet_wrap.

I'm using the Wilcoxon test, and with each gene that I run, I get some warnings like this:

Warning messages:
1: Removed 39 rows containing non-finite values (stat_signif). 
2: In wilcox.test.default(c(1904, 736, 26, 43, 420, 336, 105, 569,  :
  cannot compute exact p-value with ties
3: In wilcox.test.default(c(162, 23, 94, 25, 22, 1, 19, 148, 76, 48,  :
  cannot compute exact p-value with ties
4: In wilcox.test.default(c(0.0143552929770701, 0.739848102699327,  :
  cannot compute exact p-value with ties
5: In wilcox.test.default(c(110, 204, 164, 437), c(33, 15, 239, 65,  :
  cannot compute exact p-value with ties
6: In wilcox.test.default(c(14, 96, 8, 31, 89, 1, 168, 20, 574, 0,  :
  cannot compute exact p-value with ties
7: Removed 39 rows containing non-finite values (stat_boxplot). 
8: Position guide is perpendicular to the intended axis. Did you mean to specify a different guide `position`?

I see that some rows are being removed so I'm losing data because of this. What do those warnings mean? I don't have any non-finite numbers in my data, and what are the Wilcoxon ties?

Here is the code, I made a function:

showgene <- function(gene) {
  
  dt = data.frame(expr=c(t(dataset1[gene,]),t(dataset2[gene,]), t(dataset3[gene,]),t(dataset4[gene,]),
                                 t(dataset5[gene,]),t(dataset6[gene,]),t(dataset7[gene,]),t(dataset8[gene,]),t(dataset9[gene,])
                                 ,t(dataset10[gene,]),t(dataset11[gene,]),t(dataset12[gene,]),t(dataset13[gene,]),t(dataset15[gene,]),
                                 t(dataset16[gene,]) ,t(dataset20[gene,]),t(dataset21[gene,])),
                          response=Response,  dataSet=c(rep('data1',ncol(dataset1)),
                                                      rep('data2',ncol(dataset2)),
                                                      rep('data3',ncol(dataset3)),
                                                      rep('data4',ncol(dataset4)),
                                                      rep('data5',ncol(dataset5)),
                                                      rep('data6',ncol(dataset6)),
                                                      rep('data7',ncol(dataset7)),
                                                      rep('data8',ncol(dataset8)),
                                                      rep('data9',ncol(dataset9)),
                                                      rep('data10',ncol(dataset10)),
                                                      rep('data11',ncol(dataset11)),
                                                      rep('data12',ncol(dataset12)),
                                                      rep('data13',ncol(dataset13)),
                                                      rep('data15',ncol(dataset15)),
                                                      rep('data16',ncol(dataset16)),
                                                      rep('data20',ncol(dataset20)),
                                                      rep('data21',ncol(dataset21))))
  
  dt[['expr']] = as.numeric(as.character(dt[['expr']]))
  
  facetplot = dt %>% ggplot(aes(response, expr, fill = response)) +
    facet_wrap(~dataSet, scales = 'free') + 
    labs(x = 'Clinical outcome', y = 'Expression') +
    ggtitle(gene) + theme(plot.title = element_text(hjust = 0.5)) +
    stat_compare_means(comparisons = my_comparisons, vjust = 1.2, method = "wilcox.test") 
  
  
  boXplots = facetplot + geom_boxplot()
  return(boXplots)
  
}

showgene('CD274')

The boxplot I get:

An example of `dt`, what it would look like more or less:

structure(list(expr = c(1484, 290, 1421, 251, 203, 888, 608, 
1203, 1340, 1021, 182, 170, 291, 401, 140, 117, 582, 1177, 191, 
152, 111, 24, 187, 705, 1122, 694, 224, 1122, 501, 268, 1277, 
270, 705, 276, 88, 157, 2564, 25, 251, 255, 484, 96, 37, 180, 
169, 949, 1477, 128, 321, 32.164880027492, 30.5002842845929, 
30.3194383690632, 31.2055296895404, 31.9247612316469, 30.6333196961515, 
30.0292937311801, 30.9803064773279, 30.0890307092925, 31.6247367596842, 
30.2033356286348), response = c("NoResponse", "NoResponse", "Response", 
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "Response", 
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse", 
"NoResponse", "Response", "Response", "NoResponse", "Response", 
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse", 
"Response", "NoResponse", "NoResponse", "Response", "Response", 
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse", 
"NoResponse", "NoResponse", "Response", "NoResponse", "NoResponse", 
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse", 
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "Response", 
"NoResponse", "Response", "NoResponse", "NoResponse", "NoResponse", 
"Response", "Response", "NoResponse", "NoResponse", "Response", 
"Response", "NoResponse"), dataSet = c("data1", "data1", "data1", 
"data1", "data1", "data1", "data1", "data1", "data1", "data1", 
"data1", "data1", "data1", "data1", "data1", "data1", "data1", 
"data1", "data1", "data1", "data1", "data1", "data1", "data1", 
"data1", "data1", "data1", "data1", "data1", "data1", "data1", 
"data1", "data1", "data1", "data1", "data1", "data1", "data1", 
"data1", "data1", "data1", "data1", "data1", "data1", "data1", 
"data1", "data1", "data1", "data1", "data2", "data2", "data2", 
"data2", "data2", "data2", "data2", "data2", "data2", "data2", 
"data2")), row.names = c(NA, 60L), class = "data.frame")

score 0 · Answer 1 · answered Aug 31 '22 at 15:58

0

In the first line of your function, I see:

dt = data.frame(expr=c(t(dataset1,[gene,])...

shouldn't it be

dt = data.frame(expr=c(t(dataset1[gene,])...

instead?

Is this a typo? I am writing an answer since I cannot make a comment yet.

answered Aug 31 '22 at 15:58

B_Heidel

44
4

No. This gives the expression of the wanted gene in each sample. – Programming Noob Aug 31 '22 at 16:11

Cannot compute exact p-value with ties while using boxplots

Here is the code, I made a function:

The boxplot I get:

An example of dt, what it would look like more or less:

1 Answers1

An example of `dt`, what it would look like more or less: