1

I am working with menstrual cycle data, and I want to investigate if carrying an infection predicts the occurence of pre-menstrual symptoms. In addition, I have no a priori reason to think that the pre-menstrual phase lasts for 3, 4 or 5 days (or less or more). Thus I would like to compare models with different versions of the variable "phase" to investigate what is the most relevance length for the pre-menstrual phase when one wants to predict pre-menstrual symptoms as a cue to an infection.

However, for the sort of data and question I have, I must use glmmPQL, which does not compute an AIC, thus I can not use MuMin and other similar packages (I can't get a qAIC either). For now I have used the ROC package and performance function to compare models but I am not sure it is a reasonable method. Below I detail my dataset and the model(s). I have spent a lot of time on the net trying to find out a way to compare glmmPQL models with binomial response and temporal autocorrelation structure but none works in my case (e.g. comparing gls models because my response is binomial, comparing lmer models because my data are autocorrelated). Any help would be greatly appreciated thank you!

The dataset: each line describes a day, and days are repeated within women. In this example data inform on 1 menstrual cycle only. The response variable is binary (0,1) and so are the fixed variables "inf" (infected: yes/no) and "phase" (phase: pre-menstrual/other). Because the data are autocorrelated temporally (symptoms each day are correlated with symptoms occurring the previous day within women), I use glmmPQL to include the temporal autocorrelation structure and the random effect ID. This gives the model below (length: cycle length; dcycle: day in cycle)

# model # 
library(MASS)
library(nlme)
modc.PQL<-glmmPQL(cramps~inf*phase+log(age)+log(clength), random=~1|id,  correlation=corCAR1(form=~dcycle|id), family=binomial, data=data2)

# inspecting model fit #
require(ROCR)
pr <- predict(modc.PQL)
pred <- prediction(as.vector(pr), data2$cramps)
auc.perf = performance(pred, measure = "auc")
auc.perf@y.values

Now I would like to compare models with different versions of the variable Phase (phase4-> 4 days before menstruation; phase5-> 5 days before menstruation etc...)

# candidate set #
modc.PQL4<-glmmPQL(cramps~inf*phase4+log(age)+log(clength), random=~1|id, correlation=corCAR1(form=~dcycle|id), family=binomial, data=data2)      
modc.PQL5<-glmmPQL(cramps~inf*phase5+log(age)+log(clength), random=~1|id, correlation=corCAR1(form=~dcycle|id), family=binomial, data=data2)    
modc.PQL6<-glmmPQL(cramps~inf*phase6+log(age)+log(clength), random=~1|id, correlation=corCAR1(form=~dcycle|id), family=binomial, data=data2)  
modc.PQL7<-glmmPQL(cramps~inf*phase7+log(age)+log(clength), random=~1|id, correlation=corCAR1(form=~dcycle|id), family=binomial, data=data2)

# inspecting model fit model by model #
require(ROCR)
pr <- predict(modc.PQL3)
pred <- prediction(as.vector(pr), data2$cramps)
auc.perf = performance(pred, measure = "auc")
auc.perf@y.values

pr <- predict(modc.PQL4)
pred <- prediction(as.vector(pr), data2$cramps)
auc.perf = performance(pred, measure = "auc")
auc.perf@y.values

etc...

This does not feel right because I don't have a set criterium for deciding when the accuracy of two models is different or not (like the 2 points rule when using AIC). Thus all models might be roughly equivalent even though one has a better accuracy value.

Finally, if anyone knows how to run a half norm plot on a glmmPQL let me know!

with many many thanks

Alex

merv
  • 67,214
  • 13
  • 180
  • 245
Alex
  • 31
  • 2

0 Answers0