2

Let me begin by saying this is a class assignment for an intro to R course.

First, in VGAM why are there dparetoI, ParetoI, pparetoI, qparetoI & rparetoI?
Are they not the same things?

My problem: I would like to generate 50 random numbers in a pareto distribution. I would like the range to be 1 – 60 but I also need to have 30% of the values <= 4.

Using VGAM I have tried a variety of functions and combinations of pareto from what I could find in documentation as well as a few things online.

I experimented with fit, quantiles and forcing a sequence from examples I found but I'm new and didn't make much sense of it.

I’ve been using this:

alpha <- 1   # location 
k <- 2 # shape     
mySteps <- rpareto(50,alpha,k)
range(mySteps)
str(mySteps[mySteps <= 4])

After enough iterations, the range will be acceptable but entries <= 4 are never close.

So my questions are:

Am I using the right pareto function? If not, can you point me in the right direction? If so, do I just keep running it until the “right” data comes up?

Thanks for the guidance.

MMDG
  • 83
  • 1
  • 9
  • what level of stats do you know? This follows the same structure as the other distributions in r. `rnorm` generates random normals, `dnorm` for the density, `runinf` and `dunif` are the same for the uniform distribution, so yes, `rpareto` is the right one for random numbers draw from the Pareto distribution – rawr Feb 15 '14 at 02:30
  • I took stats a very long time ago and it feels like I'm starting over. Thanks. Good to know I'm on the right track. – MMDG Feb 15 '14 at 14:46

1 Answers1

1

So reading the Wikipedia entry for Pareto Distribution, you can see that the CDF of the Pareto distribution is given by:

FX(x) = 1 - (xm/x)α

The CDF gives the probability that X (your random variable) < x (a given value). You want Pareto distributions where

Prob(X < 4) ≡ FX(4) = 0.3

or

0.3 = 1 - (xm/4)α

This defines a relation between xm and α

xm = 4 * (0.7)1/α

In R code:

library(VGAM)
set.seed(1)
alpha <- 1
k     <- 4 * (0.7)^(1/alpha)
X     <- rpareto(50,k,alpha)
quantile(X,0.3)   # confirm that 30% are < 4
#      30% 
# 3.891941

Plot the histogram and the distribution

hist(X, breaks=c(1:60,Inf),xlim=c(1,60))
x <- 1:60
lines(x,dpareto(x,k,alpha), col="red")

If you repeat this process for different alpha, you will get different distribution functions, but in all cases ~30% of the sample will be < 4. The reason it is only approximately 30% is that you have a finite sample size (50).

Community
  • 1
  • 1
jlhoward
  • 58,004
  • 7
  • 97
  • 140