I have multiple cancer datasets, with genes as rows and samples as columns. Each data set has samples that responded to the therapy, labeled Response
, and samples that did not respond, labeled NoResponse
.
I'm trying to assess the difference between those two groups in each dataset separately. One way of doing that is by checking the difference in the expression of important genes. I'm doing that by using boxplots with facet_wrap
.
I'm using the Wilcoxon test, and with each gene that I run, I get some warnings like this:
Warning messages:
1: Removed 39 rows containing non-finite values (stat_signif).
2: In wilcox.test.default(c(1904, 736, 26, 43, 420, 336, 105, 569, :
cannot compute exact p-value with ties
3: In wilcox.test.default(c(162, 23, 94, 25, 22, 1, 19, 148, 76, 48, :
cannot compute exact p-value with ties
4: In wilcox.test.default(c(0.0143552929770701, 0.739848102699327, :
cannot compute exact p-value with ties
5: In wilcox.test.default(c(110, 204, 164, 437), c(33, 15, 239, 65, :
cannot compute exact p-value with ties
6: In wilcox.test.default(c(14, 96, 8, 31, 89, 1, 168, 20, 574, 0, :
cannot compute exact p-value with ties
7: Removed 39 rows containing non-finite values (stat_boxplot).
8: Position guide is perpendicular to the intended axis. Did you mean to specify a different guide `position`?
I see that some rows are being removed so I'm losing data because of this. What do those warnings mean? I don't have any non-finite numbers in my data, and what are the Wilcoxon ties?
Here is the code, I made a function:
showgene <- function(gene) {
dt = data.frame(expr=c(t(dataset1[gene,]),t(dataset2[gene,]), t(dataset3[gene,]),t(dataset4[gene,]),
t(dataset5[gene,]),t(dataset6[gene,]),t(dataset7[gene,]),t(dataset8[gene,]),t(dataset9[gene,])
,t(dataset10[gene,]),t(dataset11[gene,]),t(dataset12[gene,]),t(dataset13[gene,]),t(dataset15[gene,]),
t(dataset16[gene,]) ,t(dataset20[gene,]),t(dataset21[gene,])),
response=Response, dataSet=c(rep('data1',ncol(dataset1)),
rep('data2',ncol(dataset2)),
rep('data3',ncol(dataset3)),
rep('data4',ncol(dataset4)),
rep('data5',ncol(dataset5)),
rep('data6',ncol(dataset6)),
rep('data7',ncol(dataset7)),
rep('data8',ncol(dataset8)),
rep('data9',ncol(dataset9)),
rep('data10',ncol(dataset10)),
rep('data11',ncol(dataset11)),
rep('data12',ncol(dataset12)),
rep('data13',ncol(dataset13)),
rep('data15',ncol(dataset15)),
rep('data16',ncol(dataset16)),
rep('data20',ncol(dataset20)),
rep('data21',ncol(dataset21))))
dt[['expr']] = as.numeric(as.character(dt[['expr']]))
facetplot = dt %>% ggplot(aes(response, expr, fill = response)) +
facet_wrap(~dataSet, scales = 'free') +
labs(x = 'Clinical outcome', y = 'Expression') +
ggtitle(gene) + theme(plot.title = element_text(hjust = 0.5)) +
stat_compare_means(comparisons = my_comparisons, vjust = 1.2, method = "wilcox.test")
boXplots = facetplot + geom_boxplot()
return(boXplots)
}
showgene('CD274')
The boxplot I get:
An example of dt
, what it would look like more or less:
structure(list(expr = c(1484, 290, 1421, 251, 203, 888, 608,
1203, 1340, 1021, 182, 170, 291, 401, 140, 117, 582, 1177, 191,
152, 111, 24, 187, 705, 1122, 694, 224, 1122, 501, 268, 1277,
270, 705, 276, 88, 157, 2564, 25, 251, 255, 484, 96, 37, 180,
169, 949, 1477, 128, 321, 32.164880027492, 30.5002842845929,
30.3194383690632, 31.2055296895404, 31.9247612316469, 30.6333196961515,
30.0292937311801, 30.9803064773279, 30.0890307092925, 31.6247367596842,
30.2033356286348), response = c("NoResponse", "NoResponse", "Response",
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "Response",
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse",
"NoResponse", "Response", "Response", "NoResponse", "Response",
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse",
"Response", "NoResponse", "NoResponse", "Response", "Response",
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse",
"NoResponse", "NoResponse", "Response", "NoResponse", "NoResponse",
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "NoResponse",
"NoResponse", "NoResponse", "NoResponse", "NoResponse", "Response",
"NoResponse", "Response", "NoResponse", "NoResponse", "NoResponse",
"Response", "Response", "NoResponse", "NoResponse", "Response",
"Response", "NoResponse"), dataSet = c("data1", "data1", "data1",
"data1", "data1", "data1", "data1", "data1", "data1", "data1",
"data1", "data1", "data1", "data1", "data1", "data1", "data1",
"data1", "data1", "data1", "data1", "data1", "data1", "data1",
"data1", "data1", "data1", "data1", "data1", "data1", "data1",
"data1", "data1", "data1", "data1", "data1", "data1", "data1",
"data1", "data1", "data1", "data1", "data1", "data1", "data1",
"data1", "data1", "data1", "data1", "data2", "data2", "data2",
"data2", "data2", "data2", "data2", "data2", "data2", "data2",
"data2")), row.names = c(NA, 60L), class = "data.frame")