0

I'm seeking to create something like this:

Example Output

Using my own data, I would be specifically using the p-values I found here:

KS test p-values

I was able to produce something similar, albeit with the incorrect method. Specifically, I was able to produce something similar using a T-test:

T test p-value

I produced this by writing this code:

l<- ggplot(VioPos, aes(x=Regulation, y=Score,fill=Regulation)) +
  geom_violin(trim=FALSE)+
  labs(title="Plot of ARE Scores by Regulation",x="Gene Regulation", y = "ARE Score")+
  geom_boxplot(width=0.1,fill="white")+
  theme_classic()
l

dp <- l +  scale_y_continuous(trans="log2")
dp



dp7 <- dp +
  stat_compare_means(comparisons=my_comparisons, method="t.test")
dp7

In other words, I utilized stat_compare_means() using ggplot2/tidyverse/ggpubr/rstatix.

However, if I modify the method in the code, it seems to display correctly for Wilcoxon and T tests, but not for ANOVA and Kruskal-Wallis tests. Moreover, it seems that stat_compare_means() only supports those four and not KS, but I'm specifically interested in plotting mean comparisons from my KS test output onto my violin plots. Is there some other package I can use?

Also please note: for the KS test, the "UpScorePos" "DownScorePos" etc. was to compare ARE score by regulation (as I did with the graphs in the T test).

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
glico
  • 3
  • 4

1 Answers1

0

You can get the p-value from a KS-test like this:

x <- rnorm(100)
y <- rnorm(100)
res <- ks.test(x, y)
res$p.value
[1] 0.9670685

Just use this p-value and add it to your plots.

EDIT: A somewhat hacky solution is to use run a t-test and get the right data structure that can be used with stat_pvalalue_manual and insert the pvalues from a ks.test. See the example below (I used the ToothGrowth data as an example).

# Transform `dose` into factor variable
df <- ToothGrowth
df$dose <- as.factor(df$dose)

stat.test <- df %>%
  t_test(len ~ dose)
stat.test

# prepare test tibble for ks.test
stat.test <- df %>%
  t_test(len ~ dose)
stat.test <- stat.test %>% add_y_position()
stat.test

kst <- stat.test # copy tibble to overwrite p-values for ks.test

p1 <- ks.test(x = ToothGrowth$len[ToothGrowth$dose == 0.5],
              y = ToothGrowth$len[ToothGrowth$dose == 1]
)$p
p2 <- ks.test(x = ToothGrowth$len[ToothGrowth$dose == 0.5],
              y = ToothGrowth$len[ToothGrowth$dose == 2]
)$p
p3 <- ks.test(x = ToothGrowth$len[ToothGrowth$dose == 1],
              y = ToothGrowth$len[ToothGrowth$dose == 2]
)$p

kst[, 'p'] <- as.numeric(c(p1, p2, p3))

ggplot(df, aes(x = dose, y = len)) +
  geom_violin(trim = F) +
  stat_pvalue_manual(kst, label = "p = {p}")

violing plot with ks-test p-values

tester
  • 1,662
  • 1
  • 10
  • 16
  • And for my case as illustrated above, where I have (x,y,z) : I already have the p-values from my KS tests that I'm interested in, but I'm not sure how to plot it. I don't think I can use a similar method where I used stat_compare_means(), can I? Since it appears that it only supports a limited number of methods? – glico Mar 03 '21 at 00:11
  • Exactly, it seems like `stat_compare_means()` doesn't support a `ks.test`. However, you can take a look at `stat_pvalue_manual()`, you'll find a description here: https://www.datanovia.com/en/blog/how-to-add-p-values-to-ggplot-facets/ – tester Mar 03 '21 at 00:34
  • I'm not sure if the method posted above works for stat_pvalue_manual either (just taking a look at the arguments). Moreover, when I did my KS test originally (in that image), I didn't really have a good way to do it, so I went about it in a relatively complicated fashion: I actually originally had three different data frames (for non, up, and down regulated genes) and so I simply selected "Score" from each data frame, and then I stored them in their own variables (UpScoresPos, DownScoresPos, NonScoresPos). [con't] – glico Mar 03 '21 at 01:27
  • [con't] Then to actually do the ks test, I needed to as.numeric(unlist()) for each of those three stored variables, otherwise ks.test wouldn't work. The examples ( https://rpkgs.datanovia.com/ggpubr/reference/stat_pvalue_manual.html ) are slightly confusing for my case. I've stored each p-value as in your example in their own variables, I called it u,v, and w and stored then stored those in their own vector called pvals, but now I'm not sure what to do from here? – glico Mar 03 '21 at 01:32