Finding cutoff between two overlapping density plots

Question

I have this sample data of expression of two different gene variants:

value<-cbind(c(rnorm(100,500,90),rnorm(100,800,120)))
genotype<-cbind(c(rep("A",100),rep("B",100)))
df<-cbind(value,genotype)
df<-as.data.frame(df)
colnames(df)<-c("value","genotype")
df$value<-as.numeric(as.character(df$value))

I plotted these two genotype variants by their expression and am trying to determine the optimal cutoff value of the assay that differentiates between them:

d <- density(value)
plot(d, main="Genotypes A and B", ,type="n",xlim=c(200,1100),ylim=c(0,0.005),xlab="Units of expression",ylab="")
d1 <- density(subset(value,genotype=="A"))
polygon(d1, col = adjustcolor('gray', alpha.f = .40), border="black") 
d2 <- density(subset(value,genotype=="B"))
polygon(d2, col = adjustcolor('gray', alpha.f = .40), border="black")

Obviously I can use "abline" function to find the best cutoff between the two densities, but is there a neater way to identify the cutoff?

So you are defining the cutoff by the point where the two densities have the same value (assuming there is exactly one such point), right? — Julius Vainora, Dec 05 '15 at 19:00
Possible duplicate of [How to find the intersection of two densities with ggplot2 in R](http://stackoverflow.com/questions/25453706/how-to-find-the-intersection-of-two-densities-with-ggplot2-in-r) — Julius Vainora, Dec 06 '15 at 02:51
It is similar, but there are two differences: 1, I am trying to do it without ggplot. 2, I am trying to get R to give me the actual number of intersection. — Oposum, Dec 06 '15 at 22:09
Check the accepted answer; only the last line there is about ggplot2, whereas the point that you want is named as `intersection.point`. — Julius Vainora, Dec 06 '15 at 22:46
Thanks, I was able to reproduce that, and posted and answer below. I did not quite understand the logic behind it though, for example what is the need for `n = 2^10` in the code — Oposum, Dec 06 '15 at 23:37
@Julius, interestingly though, it gives me two numbers and none of them visually appear to be at the intersection point. — Oposum, Dec 06 '15 at 23:48
First, there are two samples with different supports, so you want to estimate both densities in the common support to make things easier (hence `lower.limit`, `upper.limit` and `from`, `to` values). Second, you would like to get as precise as possible value of `intersection.point`, so you estimate these two densities at `n = 2^10` points to have a pretty dense grid. Then to obtain `intersection.point` there are few more steps, check what each of them do separately. — Julius Vainora, Dec 06 '15 at 23:50
It should not be the case if you follow all the steps from that answer (instead of just one line). Once you get the result, feel free to accept your answer. — Julius Vainora, Dec 06 '15 at 23:51
Thanks, I found a minor error in my question code. After fixing that, the code in the answer works. Thanks for clarifying the purpose of `n = 2^10` — Oposum, Dec 07 '15 at 00:12

Oposum · Accepted Answer · 2015-12-07T00:11:53.730

0

Here is an answer based on the link provided by @Julius:

A.density <- density(subset(df, genotype == "A")$value, from = min(df$value), to = max(df$value), n = 2^10)
B.density <- density(subset(df, genotype == "B")$value, from = min(df$value), to = max(df$value), n = 2^10)
intersection.point <- A.density$x[which(diff((A.density$y - B.density$y) > 0) != 0) + 1]

edited Dec 07 '15 at 00:11

answered Dec 06 '15 at 23:37

Oposum

1,155
3
22
38

Here is another, simpler solution that seems to be more precise: `library(pROC)` , once the package is loaded, run this command: `coords(roc(genotype,value), "b", ret="t")` – Oposum Jan 24 '16 at 16:16

Finding cutoff between two overlapping density plots

1 Answers1