I am trying to evaluate the fit of my mixture model by overlaying the histogram of the original data with three beta distributions. I am using ggplot2 for visualization, but I am facing issues with the overlay.
The histogram of the original data is showing up as expected, but the beta distributions are squeezed on the left side and disconnected. I have tried various modifications in the code but still couldn't resolve the issue.
Can anyone help me with creating a proper overlay of histogram and multiple beta distributions in ggplot2 to evaluate the fit of my mixture model?
Below you can see my code for example.
beta1 <- dbeta(C1, alpha[1], beta[1])
beta2 <- dbeta(C2, alpha[2], beta[2])
beta3 <- dbeta(C3, alpha[3], beta[3])
vaf_hist <-data.frame(table(vaf_trim))
vaf_hist$density <- vaf_hist$Freq /length(vaf_trim)* 100
'''
alpha <- c(5.294863, 23.065857, 29.756515)
beta <- c(93.69862, 75.92762, 69.23697)
'''
plot <- ggplot() +
# geom_bar(data = vaf_hist, aes(x=vaf_trim, y = Freq), position="stack", stat="identity") +
geom_bar(data = vaf_trim, position = "stack", stat = "identity") +
geom_line(aes(C1, beta1, color = "Cluster1"),
size = 1.2, position="stack", stat="identity",
linetype = "dotdash", )+
geom_line(aes(C2, beta2, color = "Cluster2"),
size = 1.2, position="stack", stat="identity",
linetype = "dotdash")+
geom_line(aes(C3, beta3, color = "Cluster3"),
size = 1.2, position="stack", stat="identity",
linetype = "dotdash")+
labs(title = "Mixture of 3 Betas",
x = "Variant Allele Frequency",
y = "Density",
colour = "Beta Distr",
caption = "DPMM with simulated data()")
plot
Here's the example of data used. vaf_hist
vaf_trim Freq density
1 0.02 930 2.175438596
2 0.03 3788 8.860818713
3 0.04 4743 11.094736842
4 0.05 4973 11.632748538
5 0.06 3995 9.345029240
6 0.07 2899 6.781286550
7 0.08 1733 4.053801170
8 0.09 951 2.224561404
9 0.1 443 1.036257310
10 0.11 192 0.449122807