Calculating standard deviations in Stata to approximate beta distributions

Question

My question relates to calculating the standard deviation (SD) of transition probabilities derived from coefficients estimated through Weibull regression in Stata.

The transition probabilities are being used to model disease progression of leukemia patients over 40 cycles of 90 days (about 10 years). I need the SDs of the probabilities (which change over the run of the Markov model) to create beta distributions whose parameters can be approximated using the corresponding Markov cycle probability and its SD. These distributions are then used to do Probabilistic sensitivity analysis, i.e., they are substituted for the simple probabilities (one for each cycle) and random draws from them can evaluate the robustness of the model’s cost-effectiveness results.

Anyway, using time to event survival data, I’ve used regression analysis to estimate coefficients that can be plugged into an equation to generate transition probabilities. For example...

. streg, nohr dist(weibull)

        failure _d:  event
   analysis time _t:  time

Fitting constant-only model:

Iteration 0:   log likelihood = -171.82384
Iteration 1:   log likelihood = -158.78902
Iteration 2:   log likelihood = -158.64499
Iteration 3:   log likelihood = -158.64497
Iteration 4:   log likelihood = -158.64497

Fitting full model:
Iteration 0:   log likelihood = -158.64497  

Weibull regression -- log relative-hazard form 

No. of subjects =           93                     Number of obs   =        93
No. of failures =           62
Time at risk    =        60250
                                                   LR chi2(0)      =     -0.00
Log likelihood  =   -158.64497                     Prob > chi2     =         .

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------
-------------+------
       _cons |  -4.307123   .4483219    -9.61   0.000    -5.185818   -3.428429
-------------+----------------------------------------------------------
-------------+------
       /ln_p |  -.4638212   .1020754    -4.54   0.000    -.6638854    -.263757
-------------+----------------------------------------------------------
-------------+------
           p |    .628876   .0641928                      .5148471    .7681602
         1/p |   1.590139   .1623141                      1.301812    1.942324

We then create the probabilities with an equation () that uses p and _cons as well as t for time (i.e., Markov cycle number) and u for cycle length (usually a year, mine is 90 days since I’m working with leukemia patients who are very likely to have an event, i.e., relapse or die).

So where lambda = p, gamma = (exp(_cons))

gen result = (exp((lambda*((t-u)^ (gamma)))-(lambda*(t^(gamma)))))

gen transitions = 1-result

Turning to the variability, I first calculate the standard errors for the coefficients

. nlcom (exp(_b[_cons])) (exp(_b[/ln_p]))

       _nl_1:  exp(_b[_cons])
       _nl_2:  exp(_b[/ln_p])

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------
-------------+------
       _nl_1 |   .0116539   .0044932     2.59   0.009     .0028474    .0204604
       _nl_2 |   .6153864    .054186    11.36   0.000     .5091838     .721589

But what I’m really after is the standard errors on the transitions values, e.g.,

nlcom (_b[transitions])

But this doesn’t work and the book I'm using doesn't give hints on getting at this extra info. Any feedback on how to get closer would be much appreciated.

Please be careful about cross-posting. Note Statalist etiquette at http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting and comments at e.g. http://meta.stackexchange.com/questions/22657/is-it-wrong-to-ask-the-same-question-elsewhere — Nick Cox, Mar 01 '14 at 00:17
In general references to `_b[varname]` will work only if `varname` was a predictor in the previously fitted model. Is that the case here? — Nick Cox, Mar 01 '14 at 00:19
Hi, no, it is not. I'm plugging the predictors into an equation and want to get the standard deviation of the results of that equation. I suppose since that's an equation and not a model, there's no variance associated with the results. Perhaps I need to focus on getting a joint standard deviation for the predictors. Thanks the cross-posting info. — Emily, Mar 03 '14 at 19:22

Steve Samuels · Answer 1 · 2014-03-27T12:26:37.243

1

This answer is incorrect, but I keep it because it generated some useful comments

sysuse auto, clear gen u = 90 +rnormal()

set seed 1234 capture program drop _all

program define myprog , rclass tempvar result reg turn disp /* Here substitute your -streg- statement */ gen result' = _b[disp]*u sumresult' return scalar sd = r(sd) end

bootstrap sdr = r(sd): myprog estat bootstrap, bc percentile

Of note: in the bootstrapped program, the new variable (your result) must be defined as temporary; otherwise the gen statement will lead to an error because the variable is created anew for each bootstrap replicate.

edited Mar 27 '14 at 12:26

answered Mar 11 '14 at 02:23

Steve Samuels

908
4
8

A minute tweak (and speed-up) is that you don't need the `detail` option of `summarize` here. SDs are calculated by default. – Nick Cox Mar 11 '14 at 10:31
Thanks, Nick. I had been thinking of -mean-. I've deleted the `detail` option. – Steve Samuels Mar 11 '14 at 21:54
Thanks, Steve! This method returns a _cons SE that is v close to the Weibull. However we’re still interested in the combination, hence the initial nlcom. A temporary work-around is the binomial `√(p(1-p)/n)` where p=transition prob But the Weibull reg creates 2 coefficients used to calculate the p’s and the bootstrapped results need to be a combo of them. `streg, nohr dist(weibull) gen `result' = _b[_cons]*u _b[ln_p]*u` doesn’t work, nor does creating a variable that combines these 2 in the way that the transitions are calculated. Any further thoughts are welcome and appreciated. – Emily Mar 13 '14 at 18:49

Steve Samuels · Accepted Answer · 2014-03-27T11:35:41.150

Second Answer: 2014-03-23

Update: 2014-03-26 I fixed the negative probabilities: I'd made an error in transcribing Emily's code. I also show that nlcom as suggested on Statalist by Austin Nichols (http://www.stata.com/statalist/archive/2014-03/msg00002.html). I made one correction to Austin's code.

Bootstrapping is still the key to the solution. The target quantities are probabilities calculated by a formula that is based on a nonlinear combination of estimated parameters from streg. ~~As the estimates are not contained in the matrix e(b) returned after streg, nlcom will not estimate the standard errors.~~ This is an ideal situation for bootstrapping. The standard approach is adopted: create a program myprog to estimate the parameters; then bootstrap that program.

In the example, transition probabilities pt for a range of t values are estimated. The user must set the minimum and maximum of the t range as well as a scalar u. Of interest, perhaps, is that , since the number of estimated parameters is variable, a foreach statement is required inside myprog. Also, bootstrap requires an argument that consists of a list of estimates returned by myprog. This list is also constructed in a foreach loop.

/* set u  and minimum and maximum times here */
scalar u = 1
local tmin = 1
local tmax = 3

set linesize 80

capture program drop _all

program define myprog , rclass
syntax anything
streg , nohr  dist(weibull)
scalar lambda = exp(_b[ln_p:_cons])
scalar gamma =exp(_b[_t:_cons])

forvalues t = `1'/`2'{
scalar p`t'= 1 -  ///
(exp((lambda*((`t'-u)^(gamma)))-(lambda*(`t'^(gamma)))))
return scalar p`t' = p`t'
}
end


webuse cancer, clear
stset studytime, fail(died)
set seed 450811

/* set up list of returned probabilities for bootstrap */
forvalues t = `tmin'/`tmax' {
local p`t' = "p" + string(`t')
local rp`t'= "`p`t''" + "=" +  "("+ "r(" + "`p`t''" +"))"
local rlist =  `"`rlist' `rp`t''"'
}

bootstrap `rlist', nodots: myprog `tmin' `tmax'
forvalues t = `tmin'/`tmax' {
qui streg, nohr dist(weibull)
nlcom 1 - ///
(exp((exp(_b[ln_p:_cons])*((`t'-u)^(exp(_b[_t:_cons]))))- ///
(exp(_b[ln_p:_cons])*(`t'^(exp(_b[_t:_cons]))))))
}

Results:

Bootstrap results                               Number of obs      =        48
                                                Replications       =        50
      command:  myprog 1 3
           p1:  r(p1)
           p2:  r(p2)
           p3:  r(p3)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          p1 |   .7009447   .0503893    13.91   0.000     .6021834    .7997059
          p2 |   .0187127    .007727     2.42   0.015     .0035681    .0338573
          p3 |   .0111243   .0047095     2.36   0.018     .0018939    .0203548
------------------------------------------------------------------------------

/* results of  nlcom */
------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |   .7009447   .0543671    12.89   0.000      .594387    .8075023
-------------+----------------------------------------------------------------
       _nl_1 |   .0187127   .0082077     2.28   0.023     .0026259    .0347995
-------------+----------------------------------------------------------------
       _nl_1 |   .0111243   .0049765     2.24   0.025     .0013706    .0208781
------------------------------------------------------------------------------

Steve, thanks for this great intro to bootstrapping. I see now why the program is necessary: to produce the parameter estimates. Since the program is using streg rather than nlcom, I'll substitute for the _b's and the bootstrapped program results will have a SE for each t. (But resulting coefficients shouldn’t be negative. I'll work on that.) — Emily, Mar 26 '14 at 18:55
You are welcome, Emily. Note that you will need a lot more than the default 50 bootstrap replicates to get reasonable SEs (~500) and perhaps 1,000 to get good CIs. Also, if you remove the `quietly` (`qui`) prefix, you will see the default bootstrap normal-based standard errors, which are the same as those listed; however the corresponding CIs are likely to be poor. I like bc-corrected CIs, which is why I showed them. Removing `qui` will also show if -streg- fails to find a solution for some replicates: look at the "dot" history. — Steve Samuels, Mar 26 '14 at 21:30

Calculating standard deviations in Stata to approximate beta distributions

2 Answers2

This answer is incorrect, but I keep it because it generated some useful comments

Second Answer: 2014-03-23