1

I have data generated by passing random values through an algorithm and I would like to get an idea of the distribution of the results. I have read the discussions regarding Anderson.Darling/Shapiro-Wilk, and QQPlots here and thus will consider myself warned against making unwarranted conclusions. I get different results, however, using the qqPlot function in two different libraries, library(car) versus library(qualityTools)

It turns out I have insufficient reputation to post images, so I cannot show my data (I took a sampling of 10,000 points from a much larger set and did not know if it were appropriate or possible to upload such a set), or the relevant plots. Of normal, log-normal, and gamma (some of the others worked but were clearly unsuited to the data, and others gave errors and no plot at all), it follows gamma the closest with some deviations starting towards the upper end of quantiles). As an example though, say I check with the following:

library(MASS)
library(qualityTools)
library(gplots)
set.seed(123)
x <- rgamma(1000,200,200)
par(mfrow <- c(2, 2))
qqPlot(x,"normal")
qqPlot(x,"log-normal")
qqPlot(x,"gamma")
plot(density(x))

which yields one thing, while clearing loaded packages and restarting R with the car package instead

library(car)
library(gplots)
set.seed(123)
x <- rgamma(1000,200,200)
par(mfrow <- c(2, 2))
qqPlot(x,"norm")
qqPlot(x,"lnorm")
qqPlot(x,"gamma")
plot(density(x))

yields something else. The gamma distribution is missing as it wants a shape parameter. I presume there are parameters not being set in the car version that had defaults or were assumed in the qualityTools, from which I might also guess that the carversion is more robust (less black box). I downloaded the source code for both qualityTools and MASS (which gets automatically loaded when I run qqPlot) but cannot find qqPlotin any of the files (Windows file content search of the un-tar'd folders). My ultimate goal is to find a distribution which models my random data (or as close as). Is this method (trying out distributions) even valid? If so, how do I get initial estimates for parameters (eg. shape, dfs, etc.) needed to check other distributions? Any help would be appreciated, thanks.

Community
  • 1
  • 1
JasonD
  • 142
  • 2
  • 10

0 Answers0