1

Recently I have processed and cleaned a data for the aim of application of a negative binomial regression.

First, I tried to use the function glm.nb of package MASS in R and I had a problem with ensuring that the model will realize the data are for one unique participant (possible correlations in a group of observations).

Then, I realized that I can use glmmPQL of package MASS or glmer of package lme4 and use the family negative binomial in it’s family link.

The question is I would like to know in which part of the model I can embed the offset (logarithm of the number of days of treatment) also how should I insert the time-constant observations for an id (such as gender and baseline age in the df)?

My latest attempt was:

(glmmPQL (event ~ treatment + offset (log(person.time)) , 
random= list (id=~1, gender=~1, baseline.age=~1), 
family= negative.binomial (theta=1.75), data=df ))

which faced with a memory-related error (probably because of the wrong code). data example:

df<-data.frame(id=rep(1:3,each=4),treatment=sample(c(0,1),12,replace = T),
event=sample(c(0,1),12,replace = T),
person.time=sample(c(15,31,30),12,replace = T),
age=rep(c(65,58,74),each=4),gender=rep(c("m","f","m"),each=4))
Amir
  • 21
  • 6

1 Answers1

1

I would like to answer my own question here, so that might be useful for some researchers in future.

First of all, I realized that lme4 package might be a better option for mixed-effects model compared to glmPQL from MASS package.

Second, I realized that when we use a glmer model on a data that we are interested in the calculation of a rate ratio such as incidence rate ratio (as in our example) we need to use a time period as in our offset argument and in case one is interested in analysis of time series with binary outcome variable one should use the duration between two observations in a row instead of a cumulative sum of those periods (so, this was the main problem I faced with when I used glmer fucntion of lme4 package).

Inclusion of arguments within the model function, such as those used in Survival package ("tstart" and "tstop" for Surv function), for simplifying the model formula for researchers with other backgrounds and less familiarity with R will be very useful.

The code to run the model on sample data above:

tdpm<-glmer(event ~ treatment + age + gender + (1|id), offset=(log(person.time)), data = df ,verbose = T, family = "poisson")

To test the correct person-time we used, we can calculate the rates manually and compare that to the exponentiation of rate ratio in further step;

(sum(df$event[df$treatment==1])/sum(df$person.time[df$treatment==1]))/(sum(df$event[df$treatment==0])/sum(df$person.time[df$treatment==0]))
exp(fixef(tdpm))[2]

However, for reasons that you can find in methods of analysing the rates and also other more sophisticated methods developed by Robins and Hernan etal. and published in chapters 12, 19, and 21 of their Causal Inference What If book, we need to add an indicator in the model to let the model realizes the order of the consecutive person-times (or adding the next person-time to the previous ones).

Amir
  • 21
  • 6