3

I hope to find weibull shape and scale parameters for a distribution that is left truncated using R's fitdistr function (MLE). Using a sample of data of tree diameters (the smallest of which being 2.8):

data<-c(42.7,18.8,30.0,20.3,32.5,18.8,16.0,42.9,18.8,17.3,21.1,23.4,15.0,16.8,15.2,15.0,14.7,17.3,20.1,18.3,16.0,15.7,21.3,
    19.1,17.3,17.0,17.3,17.5,21.6,15.7,12.7,13.2,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,2.8,2.8,2.8,2.8,2.8,
    2.8,2.8,2.8,2.8,2.8,2.8,2.8,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,4.3,4.3,4.3,4.3,4.3,4.3,
    4.3,4.3,4.3,4.3,4.3,4.3,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,
    5.6,5.6,18.0,16.3,34.8,17.5,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,
    6.3,6.3,6.3,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8)

library(MASS)
wb<-fitdistr(data,'weibull',lower=0.1) # MLE for weibull parameter determination
wb

Results: shape scale
1.36605920 9.97356797

Given the data distribution, one would expect a negative monotonic curve (e.g. shape<1). However, these results indicate shape>1 because fitdistr did not take into account the fact that the data was left truncated. Elsewhere, the following was suggested:

ltwei<-function(x,shape,scale=1,log=FALSE){dweibull(x,shape,scale,log)/pweibull(1,shape,scale,lower=FALSE) } 
ltweifit<-fitdistr(data,ltwei,start=list(shape=1,scale=10)) 
ltweifit 

but this results in an even larger weibull shape value. How can I produce shape and scale parameters for the distribution that take the data's left truncation into account? Many thanks in advance.

treetopdewdrop
  • 172
  • 5
  • 15
  • The weibull distribution is so flexible that I think it's pretty optimistic to expect a fitting algorithm to infer a truncation-effect. A random weibull vector created with those parameters "looks" a lot like your data's density when examined with `plot(density(data))`. If you have prior knowledge of truncation (which you have not yet provided), then you will need to supply it to the fitting algorithm. – IRTFM Jan 22 '15 at 17:41
  • Thanks BondedDust. I do have prior knowledge of a truncation (e.g. nothing below 2.8 was even measured). So while plot(density(data)) does resemble a random weibull vector with the fitdistr estimated parameters, it fails to take into account that values below 2.8 have been truncated. The "true" fit should start at 2.8 and monodically descend. I am looking for the weibull shape and scale parameters for this fit. You state "then you will need to supply it to the fitting algorithm" - this is precisely what I am asking for advice on how to do. Thanks again! – treetopdewdrop Jan 22 '15 at 17:48
  • I have reservations about using the smallest value as the truncation point. I would have thought that the truncation value would be set by some externality. – IRTFM Jan 22 '15 at 18:34
  • Heads up @treetopderop: You have now posted three questions and gotten what appeared to be useful answers and neither upvoted nor accepted any of them. – IRTFM Jan 22 '15 at 18:54
  • The truncation point is set by an external factor - the smallest size class measured in the dataset. Thanks for the heads up, I've accepted those answers, however this question is still not answered (see below) – treetopdewdrop Jan 22 '15 at 20:30

1 Answers1

1

This is a modification of the advice given because I thought the denominator was incorrect. Trying to change the denominator to be 1-pweibull(trunc, ...) See if this produces a better fit:

> ptwei <- function(x, shape, scale , log = FALSE) 
+               pweibull(x, shape, scale, log)/(1-
+                 pweibull(2.8, shape, scale, lower=FALSE))
> dtwei <- function(x, shape, scale , log = FALSE) 
+               dweibull(x, shape, scale, log)/(1-
+                 pweibull(2.8, shape, scale, lower=FALSE))
> fitdist(data, dtwei, start=list(shape=1.2, scale=4))
Fitting of the distribution ' twei ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 0.4555414 0.02660275
scale 0.8751571 0.12236256

I've now realized the Prof Ripley was using the lower argument to achieve what I was doing above, so the original code would work as long as the ltwei function was called with lower = FALSE

-----------

This is following the advice of Prof. Brian Ripley found on Rhelp posting of Oct 7, 2008:

ltwei <- function(x, shape, scale = 1, log = FALSE) 
              dweibull(x, shape, scale, log)/
                pweibull(2.8, shape, scale, lower=FALSE)

The weibull density is normalized by dividing by the CMF at the truncation point. (With lower=FALSE the pweibull function returns the integrated density from the truncation point to Inf.)

library(MASS)
wb<-fitdistr(data,ltwei,start=list(shape=1,scale=1) ) 

There were 50 or more warnings (use warnings() to see the first 50)
> wb
      shape         scale   
   1.81253163   12.72912552 
 ( 0.07199877) ( 0.54855731)
> warnings()[1:5]
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
3: In dweibull(x, shape, scale, log) : NaNs produced
4: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
5: In dweibull(x, shape, scale, log) : NaNs produced
> library(MASS)
> wb<-fitdistr(data,ltwei,start=list(shape=1.1,scale=10) ) 
> wb
      shape         scale   
   1.81253113   12.72912870 
 ( 0.07199877) ( 0.54855782)

You can see that some starting values generate warnings, but that the algorithm still succeeds. If you start with values that are closer to the "true values" the warnings don't appear. I'm afraid that your expectation was not met. Sometimes there is confusion about parametrization of the Weibull distribution because there are different conventions about how they are handled. This is what Terry Therneau has in his survival::survreg help page:

# There are multiple ways to parameterize a Weibull distribution. The survreg 
# function imbeds it in a general location-scale familiy, which is a 
# different parameterization than the rweibull function, and often leads
# to confusion.
#   survreg's scale  =    1/(rweibull shape)
#   survreg's intercept = log(rweibull scale)
#   For the log-likelihood all parameterizations lead to the same value.

Since you seemed unhappy with the results of fitdistr I also ran fitdist from the 'fitdistrplus'-package and got pretty much the same answer. I still think you need to examine the interpretation of the parameters and probably also the critical presumption that this is necessarily data that is Weibull-distributed:

 dtwei <- function(x, shape, scale , log = FALSE) 
               dweibull(x, shape, scale, log)/
                 pweibull(2.8, shape, scale, lower=FALSE)
 
ptwei <- function(x, shape, scale , log = FALSE) 
               pweibull(x, shape, scale, log)/
                 pweibull(2.8, shape, scale, lower=FALSE)
fitdist(data, dtwei, start=list(shape=1.2, scale=4))
#----------------
Fitting of the distribution ' twei ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape  1.812706 0.07199987
scale 12.728087 0.54839034
Community
  • 1
  • 1
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks again @BondedDust. However this is the same code I had posted from Prof. Brian Ripley (with 2.8 as the truncation rather than 1). It produces a higher shape value than no-truncation, when clearly it should produce a shape<1. The histogram in the plot below shows a steeply descending distribution, yet Prof. Ripley's truncation code produces a unimodal distribution - even more negatively skewed than the non-truncated weibull parameterization. In fact it should be negative monotonic (extreme positive skew) – treetopdewdrop Jan 22 '15 at 23:01
  • Scale is a rough approximation of the mean (which is this case is 5.9). A scale value below 1 makes no sense if there are no values below 2.8. – treetopdewdrop Jan 23 '15 at 01:02
  • When I constructed a random subset with those parameters I got only a minority that were above 2.8. The inferences you are making do NOT apply to truncated distributions! Please do remember that YOU are the one who wanted this to be a particular truncated distribution. – IRTFM Jan 23 '15 at 01:13
  • Think of this as fitting the tail of a heavily left skewed distribution – IRTFM Jan 23 '15 at 01:41