1

I'm attempting to use the nlme package to fit the Generalized beta of the 2nd kind distribution to simulated health cost data.

Running the following code on a test dataset:

Package installation (if necessary)

install.packages("withr", dependencies = T)
library(withr)
with_makevars(c(PKG_CFLAGS ="-std=gnu99"), 
       install.packages("cubature"), assignment="+=") 
install.packages("GB2", dependencies = T)
    install.packages("nlme", dependencies = T)

# load packages
library(cubature)
library(GB2)
library(nlme)

# Binary independent variables
age <- rbinom(n=1000, size=1, prob=.3)
sex <- rbinom(n=1000, size=1, prob=.5)
trmt <- rbinom(n=1000, size=1, prob=.5)

# GB2 parameter equations
shape1 <- exp(rnorm(n=1000, mean=.1 + age/100 - sex/10 + trmt/10, sd=.3))
scale <- exp(rnorm(n=1000, mean=7 + age/50 + sex - trmt, sd=.5))
shape2 <- exp(rnorm(n=1000, mean=1.5 + age/100 + sex/10 - trmt/10, sd=.3))
shape3 <- exp(rnorm(n=1000, mean=.5 + age/100 - sex/10 - trmt/10, sd=.3))

# Outcome
y <- rgb2(1000, shape1, scale, shape2, shape3)

# Create test dataset
df <- data.frame(cbind(y,age,sex,trmt,shape1,scale,shape2,shape3))

# Fit GB2 distribution to data
gb2_fit <- nlme(y ~ scale*beta(shape2 + 1/shape1, shape3 - 1/shape1)/beta(shape2, shape3),
           # data = list(y=df_gb2_test[,1]),
           data = df,
           fixed = list(shape1 ~ age + sex + trmt, 
                        scale ~ age + sex + trmt, 
                        shape2 ~ age + sex + trmt, 
                        shape3 ~ age + sex + trmt),
           start = list(fixed = c(shape1 = 1.00, scale = 100, shape2 = 1.00, shape3 = 1.00)))

I get the error:

Error in parse(text = paste("~", paste(nVal, collapse = "/"))) : 
  <text>:2:0: unexpected end of input
1: ~ 
   ^

Any ideas what I'm doing wrong? I seem to be using the tilde operator correctly.

RobertF
  • 824
  • 2
  • 14
  • 40

2 Answers2

1

I think nlme doesn't do what you think it does. It does nonlinear least squares mixed models; i.e., the response is assumed to be Gaussian, and there is assumed to be a random effect (perhaps you're confusing this with SAS PROC NLMIXED, which is more general?

library(bbmle)

## we need a version of the density function that takes a 'log' argument
dgb2B <- function(..., log=FALSE) {
  r <- GB2::dgb2(...)
  if (!log) r else log(r)
}

## don't include shape1, scale shape2, shape3 in the data, that confuses things
df2 <- df[,c("y","age","sex", "trmt")]


## fit homogeneous model
m1 <- mle2(y ~ dgb2B(shape1, scale, shape2, shape3),
     method="Nelder-Mead",
     trace=TRUE,
     data=df2,
     start = list(shape1 = 1.00, scale = 100, shape2 = 1.00, shape3 = 1.00))

## allow parameters to vary by group 
mle2(y ~ dgb2B(shape1, scale, shape2, shape3),
     ## parameters need to be in the same order!
     parameters=list(shape1 ~ age + sex + trmt,
                     scale ~ age + sex + trmt,
                     shape2 ~ age + sex + trmt,
                     shape3 ~ age + sex + trmt),
     method="Nelder-Mead",
     control=list(maxit=10000,
                  ## set parameter scales equal to magnitude
                  ## of starting values; each top-level parameter
                  ## has 4 associated values (intercept, + 3 cov effects)
                  parscale=rep(abs(coef(m1)), each=4)),
     trace=TRUE,
     data=df2,
     start = as.list(coef(m1))
)

For what it's worth, for this example you could achieve the same goal by fitting eight separate models to all of the age × sex × treatment groups (but I can appreciate that your real application may be more complicated, i.e. you might only want a subset of the parameters to vary across groups, or might want to allow parameters to vary according to a continuous covariate.

If you are going to try much harder problems you might want to fit the parameters on the log scale.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Fantastic, thank you - yes I did confuse `nlme` with `proc nlmixed`. It looks like `mle2` is what I want, which saves me the work of hand-coding a solution to the log-likelihood of the GB2 distribution. – RobertF May 28 '21 at 02:51
  • The `GB2` package does include functions, `LogDensity()` and `LogLikelihood()`, that calculate the log density and log likelihood, respectively, of the GB2 distribution at specified parameter values. Can I use one of these as the first argument in the `mle2` function, for example `mle2((-1)*(GB2::LogLikelihood(shape1, scale, shape2, shape3)), parameters=list(....), method="Nelder-Mead", etc.)`? There's also a `MLfullGB2()` that calculates maximum likelihood estimation of the GB2 based on the full log-likelihood, but don't think it accepts parameters that are estimated by separate models. – RobertF May 28 '21 at 12:53
  • Does the current solution not do what you want? The formula interface only allows density/distribution functions that start with `d` and have a `log` argument. Using the versions that give you the gradients (score function) in `mle2` would probably be an improvement over what I've done here. – Ben Bolker May 28 '21 at 14:34
  • Oh I'm sorry - I thought the `...` indicated an unknown argument was still needed. Yes, it's working. Testing the homogeneous model fit with shape1=0.70, scale=300, shape2=3.80, shape3=1.63 (GB2 parameters when I fit my real health cost data), `mle2` converges to a solution that's close: 0.7468243 257.3361 3.652994 1.404586. I may have to play around with the initial starting parameter values & convergence method, or use a larger simulated dataset. – RobertF May 28 '21 at 15:12
  • 1
    You wouldn't necessarily expect the model to converge to exactly the same parameters ... The more thorough way to test that it's working would be to do fit to many simulated data sets with the same true parameters and show that the *average* of the estimated parameters was equal/close to the true values (i.e., unbiased estimation) – Ben Bolker May 28 '21 at 18:28
  • Good point - although the trend I'm seeing after running several times is the final estimate for the GB2 scale parameter tends to be within 10% the starting value. When I fiddled around with `proc nlmixed` years ago I think we used the log of the parameter values, as you recommended, to avoid getting stuck in local minima. – RobertF May 28 '21 at 18:41
  • Yes getting closer solutions if I take the natural log of the four GB2 parameter values and run: `m1 <- mle2(y ~ dgb2B(exp(shape1), exp(scale), exp(shape2), exp(shape3)), method="Nelder-Mead", trace=TRUE, data=data.frame(y), start = list(shape1 = 0.00, scale = 10.00, shape2 = 0.00, shape3 = 0.00))`. Do sometimes see wild values for the scale parameter estimate, but if I choose very large (vs. too small) initial values for the scale parameter, Nelder-Mead does better "sliding downhill" to the correct value of the scale parameter & other parameters. – RobertF Jun 01 '21 at 18:03
  • 1
    I think you could do better than this if you used gradient information. This is possible but a bit of a pain/more detail than I wanted to go into; it would be worth it if you you're going to be doing a lot of this sort of thing... – Ben Bolker Jun 01 '21 at 18:28
  • Are you referring to the conjugate gradient method (`method="CG"`)? – RobertF Jun 01 '21 at 18:46
  • No, I mean constructing a gradient function (there are a few ways to do this; it may be worth a separate question ...)? – Ben Bolker Jun 01 '21 at 20:06
  • The `GB2` package calculates Jacobian and Hessian matrix values for the four GB2 parameters, but if the four parameters are estimated from separate regressions with their own regression parameters, I'll have to manually calculate the 1st and 2nd derivative gradients. Not impossible but time consuming getting through the algebra. I know `proc nlmixed` does that for you, not sure if anyone has already built a package in `R` for multivariate GB2 solutions. – RobertF Jun 01 '21 at 23:13
  • 1
    Not specifically for GB2 solutions, but there is the *beginning* of something along these lines at https://github.com/queezzz/qzmle/ – Ben Bolker Jun 01 '21 at 23:57
0

There is also an error happening earlier on:

y <- rgb2(1, shape1, scale, shape2, shape3)
Error in rgb2(1, shape1, scale, shape2, shape3) : 
  could not find function "rgb2"

you may need to load the required package for this:

https://www.rdocumentation.org/packages/gamlss.dist/versions/5.3-2/topics/GB2

it appears to be in library(gamlss.dist)

  • Yes, the `rgb2()` function is included in the `GB2` package, but it should run ok once you've installed `GB2`. – RobertF May 27 '21 at 23:40
  • Sorry there's a typo - it should read `y <- rgb2(1000, shape1, scale, shape2, shape3)`. – RobertF May 28 '21 at 15:03