I have a big data with more than 20 millions values, due to privacy and making the codes reproducible, I use mydata to replace it.
set.seed(1234)
mydata <- rlnorm(28000000,3.14,1.3)
I want to find which known distributions fit mydata
best, so function fitdist
in package fitdistrplus
is choosen.
library(fitdistrplus)
fit.lnorm <- fitdist(mydata,"lnorm")
fit.weibull <- fitdist(mydata, "weibull")
fit.gamma <- fitdist(mydata, "gamma", lower = c(0, 0))
fit.exp <- fitdist(mydata,"exp")
Then, I use ppcomp
function to draw P-P plot to help me choose the best fitted distribution.
library(RColorBrewer)
tiff("./pplot.tiff",res = 300,compression = "lzw",height = 6,width = 10,units = "in",pointsize = 12)
ppcomp(list(fit.lnorm,fit.weibull, fit.gamma,fit.exp), fitcol = brewer.pal(9,"Set1")[1:4],legendtext = c("lnorm","weibull", "gamma","exp"))
dev.off()
Absolutely, lognormal fits
mydata
best, but take a look at the legend
of the plot, the line annotation with different colors is missing, only text annotation shows, what should I do?
I try some datasets with few values, and it worked. So the big data leads to the question, what should I do to make the legend perfect?