2

I have this sample data of expression of two different gene variants:

value<-cbind(c(rnorm(100,500,90),rnorm(100,800,120)))
genotype<-cbind(c(rep("A",100),rep("B",100)))
df<-cbind(value,genotype)
df<-as.data.frame(df)
colnames(df)<-c("value","genotype")
df$value<-as.numeric(as.character(df$value))

I plotted these two genotype variants by their expression and am trying to determine the optimal cutoff value of the assay that differentiates between them:

d <- density(value)
plot(d, main="Genotypes A and B", ,type="n",xlim=c(200,1100),ylim=c(0,0.005),xlab="Units of expression",ylab="")
d1 <- density(subset(value,genotype=="A"))
polygon(d1, col = adjustcolor('gray', alpha.f = .40), border="black") 
d2 <- density(subset(value,genotype=="B"))
polygon(d2, col = adjustcolor('gray', alpha.f = .40), border="black") 

Obviously I can use "abline" function to find the best cutoff between the two densities, but is there a neater way to identify the cutoff?

Oposum
  • 1,155
  • 3
  • 22
  • 38
  • So you are defining the cutoff by the point where the two densities have the same value (assuming there is exactly one such point), right? – Julius Vainora Dec 05 '15 at 19:00
  • yes, exactly - where they cross each other. – Oposum Dec 06 '15 at 02:09
  • Possible duplicate of [How to find the intersection of two densities with ggplot2 in R](http://stackoverflow.com/questions/25453706/how-to-find-the-intersection-of-two-densities-with-ggplot2-in-r) – Julius Vainora Dec 06 '15 at 02:51
  • It is similar, but there are two differences: 1, I am trying to do it without ggplot. 2, I am trying to get R to give me the actual number of intersection. – Oposum Dec 06 '15 at 22:09
  • Check the accepted answer; only the last line there is about ggplot2, whereas the point that you want is named as `intersection.point`. – Julius Vainora Dec 06 '15 at 22:46
  • Thanks, I was able to reproduce that, and posted and answer below. I did not quite understand the logic behind it though, for example what is the need for `n = 2^10` in the code – Oposum Dec 06 '15 at 23:37
  • @Julius, interestingly though, it gives me two numbers and none of them visually appear to be at the intersection point. – Oposum Dec 06 '15 at 23:48
  • First, there are two samples with different supports, so you want to estimate both densities in the common support to make things easier (hence `lower.limit`, `upper.limit` and `from`, `to` values). Second, you would like to get as precise as possible value of `intersection.point`, so you estimate these two densities at `n = 2^10` points to have a pretty dense grid. Then to obtain `intersection.point` there are few more steps, check what each of them do separately. – Julius Vainora Dec 06 '15 at 23:50
  • It should not be the case if you follow all the steps from that answer (instead of just one line). Once you get the result, feel free to accept your answer. – Julius Vainora Dec 06 '15 at 23:51
  • Thanks, I found a minor error in my question code. After fixing that, the code in the answer works. Thanks for clarifying the purpose of `n = 2^10` – Oposum Dec 07 '15 at 00:12

1 Answers1

0

Here is an answer based on the link provided by @Julius:

A.density <- density(subset(df, genotype == "A")$value, from = min(df$value), to = max(df$value), n = 2^10)
B.density <- density(subset(df, genotype == "B")$value, from = min(df$value), to = max(df$value), n = 2^10)
intersection.point <- A.density$x[which(diff((A.density$y - B.density$y) > 0) != 0) + 1]
Oposum
  • 1,155
  • 3
  • 22
  • 38
  • Here is another, simpler solution that seems to be more precise: `library(pROC)` , once the package is loaded, run this command: `coords(roc(genotype,value), "b", ret="t")` – Oposum Jan 24 '16 at 16:16