0

Lets say I have the following data

library(ggplot2)
set.seed(2203)
data <- runif(100, 15, 45)
df <- data.frame(data)

And my end goal is to overlay 3 different distributions on the histogram depicting this data. I have no issue overlaying "Normal" and "LogNormal" probability density curve. My issue is with "Beta" distribution.

To overlay "Beta" distribution first I calculate shape1 and shape2 as follow. To make the data compatible for beta calculation I divide the values by 100.

library(fitdistrplus)    
parameters <- fitdist(data/100, "beta")
shape1 <- parameters$estimate[["shape1"]]
shape2 <- parameters$estimate[["shape2"]]

Calculating the values for shape1 and shape2 based on the above code, I replace these values in the last stat_funciton of my visualization. In the last step I overlay the three distributions of interest as follow:

graph_parameter_product <- function(data_set, column_value, bin_width, title_value) {
  graph <- ggplot(data_set, aes(x= column_value )) +
    geom_histogram(binwidth = bin_width, fill= "lightblue", colour= "black", aes(y=after_stat(density))) +
    labs(title = title_value) + 
    stat_function(fun = dnorm,
                  args = list(mean = mean(column_value), sd = sd(column_value)),
                  mapping=aes(colour = "Normal")) +
    stat_function(fun = dlnorm,
                  args = list(meanlog = mean(log(column_value)), sdlog = sd(log(column_value))),
                  linetype="dotted", linewidth=1.2, mapping = aes(colour = "LogNormal")) +
    stat_function(fun = function(x) dbeta(x, 7.339799, 17.05679), aes (color = "Beta"),
                  size = 1) +
    scale_colour_manual(values = c("red", "black", "green")) + 
    labs(colour="Distribution") 
  
  graph 
}

graph_parameter_product(df, data, 5, "Test Data")

As you see the Beta distribution density curve is not working correctly. What am I doing wrong?

Joe the Second
  • 339
  • 1
  • 11

1 Answers1

1

Note that beta distribution only works in [0,1]. And you do data/100 when estimating the parameters. Therefore, we can do the follow:

dbeta(x/100, 7.339799, 17.05679)/100

Full code:

graph_parameter_product <- function(data_set, column_value, bin_width, title_value) {
  graph <- ggplot(data_set, aes(x= column_value )) +
    geom_histogram(binwidth = bin_width, fill= "lightblue", colour= "black", aes(y=after_stat(density))) +
    labs(title = title_value) + 
    stat_function(fun = dnorm,
                  args = list(mean = mean(column_value), sd = sd(column_value)),
                  mapping=aes(colour = "Normal")) +
    stat_function(fun = dlnorm,
                  args = list(meanlog = mean(log(column_value)), sdlog = sd(log(column_value))),
                  linetype="dotted", linewidth=1.2, mapping = aes(colour = "LogNormal")) +
    stat_function(fun = function(x) dbeta(x/100, 7.339799, 17.05679)/100, aes (color = "Beta"),
                  size = 1) +
    scale_colour_manual(values = c("red", "black", "green")) + 
    labs(colour="Distribution") 
  
  graph 
}
one
  • 3,121
  • 1
  • 4
  • 24