0

I am fitting a Pareto distribution to some data and have already estimated the maximum likelihood estimates for the data. Now I need to create a fitdist (fitdistrplus library) object from it, but I am not sure how to do this. I need a fitdist object because I would like to create qq, density etc. plots with the function such as denscomp. Could someone help?

The reason I calculated the MLEs first is because fitdist does not do this properly - the estimates always blow up to infinity, even if I give the correct MLEs as the starting values (see below). If the earlier option of manually giving fitdist my parameters is not possible, is there an optimization method in fitdist that would allow the pareto parameters to be properly estimated?

I don't have permission to post the original data, but here's a simulation using MLE estimates of a gamma distribution/pareto distribution on the original.

library(fitdistrplus)
library(actuar)

sim <- rgamma(1000, shape = 4.69, rate = 0.482)
fit.pareto <- fit.dist(sim, distr = "pareto", method = "mle", 
                       start = list(scale = 0.862, shape = 0.00665))
#Estimates blow up to infinity
fit.pareto$estimate
BaroqueFreak
  • 125
  • 1
  • 5

1 Answers1

1

If you look at the ?fitdist help topic, it describes what fitdist objects look like: they are lists with lots of components. If you can compute substitutes for all of those components, you should be able to create a fake fitdist object using code like

fake <- structure(list(estimate = ..., method = ..., ...),
                  class = "fitdist")

For the second part of your question, you'll need to post some code and data for people to improve.

Edited to add:

I added set.seed(123) before your simulation of random data. Then I get the MLE from fitdist to be

   scale    shape 
87220272  9244012

If I plot the log likelihood function near there, I get this:

loglik <- Vectorize(function(shape, scale) sum(dpareto(sim, shape, scale, log = TRUE)))
shape <- seq(1000000, 10000000, len=30)
scale <- seq(10000000, 100000000, len=30)
surface <- outer(shape, scale, loglik)
contour(shape, scale, surface)
points(9244012, 87220272, pch=16)

screenshot

That looks as though fitdist has made a somewhat reasonable choice, though there may not actually be a finite MLE. How did you find the MLE to be such small values? Are you sure you're using the same parameters as dpareto uses?

user2554330
  • 37,248
  • 4
  • 43
  • 90
  • I don't have permission to post the data, but I have edited the question to include fitting to simulated data that approximates the original. Do you mind taking a look? – BaroqueFreak Oct 20 '20 at 13:47
  • Really appreciate your efforts. I computed the MLE's analytically using $\text{scale} = \min_{i=1,2,...n} x_{i}$ and $\text{shape} = n/(\sum_{i=1}^{n}\ln(x_{i}) - n\ln(\text{scale}))$. On reinspection, it seems that this is a different parameterisation of the pareto distribution compared to $\texttt{dpareto}$. However, this parameterisation is only different through a shifting of the scale - I feel like I should still get more reasonable parameters than what fitdist has given. – BaroqueFreak Oct 21 '20 at 10:31
  • 1
    That MLE is for the "European Pareto", according to notation in Rytgaard (1990, ASTIN Bulletin). The `dpareto` density is for the "American Pareto". They don't give the MLE for the American Pareto. You'll probably need to go to one of the references on the `?dpareto` help page. – user2554330 Oct 21 '20 at 12:30