It might be a simple question but I have been trying to fix this issue for a few days so any help is really appreciated. If there is something wrong about my question, please leave a comment so I know what I am doing wrong.
I have the following code:
library(ggplot2)
library(rlang)
library(MASS)
normalize_btw_zero_and_one <- function(x){(x-min(x)+.00000001)/(max(x)-min(x)+.00000002)}
get_histogram <- function(data_set, column_name, bin_width, attribute_name) {
colname <- as_label(enquo(column_name))
gamma_par <- MASS :: fitdistr(data_set[[colname]], "gamma")
weibull_par <- MASS :: fitdistr(data_set[[colname]], "weibull")
beta_par <- MASS :: fitdistr(normalize_btw_zero_and_one(data_set[[colname]]), dbeta, start = list(shape1 = 1, shape2 = 1))
ggplot(data_set, aes(x= {{ column_name }})) +
geom_histogram(aes(y= after_stat(density)), binwidth = bin_width, fill= "lightblue", colour="black") +
xlab(paste0(attribute_name)) +
stat_function(fun = dnorm , args= list(mean= mean(data_set[[colname]]), sd= sd(data_set[[colname]])),
mapping = aes(colour = "Normal"))+
stat_function(fun = dlnorm, args = list(meanlog= mean(log(data_set[[colname]])), sdlog= sd(log(data_set[[colname]]))),
mapping = aes(colour = "LogNormal")) +
stat_function(fun = dgamma, args= list(shape= gamma_par$estimate[[1]] , rate=gamma_par$estimate[[2]] ),
mapping = aes(colour= "Gamma"))+
stat_function(fun = dweibull, args= list(shape= weibull_par$estimate[[1]] , scale=weibull_par$estimate[[2]] ),
mapping = aes(colour= "Weibull"))+
stat_function(fun = dexp, mapping = aes(colour = "Exponential")) +
stat_function(fun = dbeta, args = list(shape1= beta_par$estimate[[1]], shape2= beta_par$estimate[[2]] ),
mapping = aes(color = "Beta")) +
scale_colour_manual("Distribution", values = c("red", "blue", "lightgreen","pink", "purple" , "yellow"))
}
set.seed(30333)
test_dt <- rnorm(10000, 30, 1)
df <- data.frame(test_dt)
rm(test_dt)
get_histogram(df, test_dt, .3, "Test ")
Issue: I want to create a histogram and overlay multiple density functions on it. Everything works fine except for the Beta distribution. The issue lies in the fact that I scaled the data to (0,1) for the calculation of the density function, but the histogram still displays the original data.
Question 1: How can I fix the overlay of the Beta distribution function?
Question 2: If I want to have frequencies on the y-axis instead of densities, what steps should I take?