Objective
Automate the process of finding the best fit distribution using gamlss
package and generating random numbers from this distribution
Example
My actual data has several variables. So, I will use 2 variables from iris
dataset in this example. Say I want to generate random numbers from best fit distribution on sepal length and petal length. I can do this as follows:
library(gamlss)
# Load data-------
data("iris")
# Define a function that finds the best fit distribution
find_dist <- function(x){
m1 <- fitDist(x, k = 2, type = "realAll", trace = FALSE, try.gamlss = TRUE)
m1
}
# Best fit distribution for Sepal.Length---------
dist_Sepal.Length <- find_dist(iris$Sepal.Length)
family_Sepal.Length <- dist_Sepal.Length$family[1] # "SEP4"
dist_Sepal.Length$Allpar
# eta.mu eta.sigma eta.nu eta.tau
# 5.8269404 0.3019834 1.8481415 0.8684266
dist_Sepal.Length$mu.link #identity
dist_Sepal.Length$sigma.link #log
dist_Sepal.Length$nu.link #log
dist_Sepal.Length$tau.link #log
## Generate a random number:
rSEP4(1, mu = 5.827, sigma = exp(0.302), nu = exp(1.848), tau = exp(0.8684))
# Best fit distribution for Petal.Length---------
dist_Petal.Length <- find_dist(iris$Petal.Length)
family_Petal.Length <- dist_Petal.Length$family[1] # ""SEP2"
dist_Petal.Length$Allpar
# eta.mu eta.sigma eta.nu eta.tau
# 4.248646 1.057717 -26.546283 3.594178
dist_Petal.Length$mu.link #identity
dist_Petal.Length$sigma.link #log
dist_Petal.Length$nu.link #identity
dist_Petal.Length$tau.link #log
## Generate a random number:
rSEP2(1, mu = 4.249, sigma = exp(1.058), nu = -26.546, tau = exp(3.594))
Challenges in Creating a Function to Automate Generating Random Numbers
I can extract the distribution from the family
attribute and all parameter values from the Allpar
attribute. The challenge is that each distribution has different parameters and link functions. Otherwise, I can directly provide Allpar
to the random number function.
Please guide me how to automate this process?