5

I have an R dataframe, strongly simplified as:

id <- rep(1:2, c(6,8))
correct <- sample(0:1,14,TRUE)
phase <- c(rep("discr",3),rep("rev",3), rep("discr",4),rep("rev",4))
dat <- data.frame(id,correct,phase)

with id as my subjects (in reality I have a lot more than 2), correct = responses coded as incorrect (0) or correct (1), and the phases Discrimination and Reversal (within-subjects factor).

I want to perform a logistic regression in the form of

glm(correct~phase, dat, family="binomial")

later possibly adding additional predictors. However, since I have a varying amount of data for each subject, I would like to perform glm() seperately for each subject and later compare the coefficients with ANOVA for group effects. I would like to do this in a for loop in the form of

for(i in seq_along(dat$id)){
   my_glm[i] <- glm(correct~list,dat[dat$id==i,],family="binomial")
}

but keep receiving the error message

>Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
 contrasts can be applied only to factors with 2 or more levels.

I have checked my data and there is no factor which contains only one level. All subjects gave at least one incorrect and one correct response, and all took part in Discrimination and Reversal. The function works outside the loop when I specify a particular subject.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Lucy Vanes
  • 237
  • 2
  • 7
  • First problem is that function seq_along(dat$id) produces numbers from 1:14 but you have only 2 levels. You can replace it with seq_along(unique(dat$id)). – Didzis Elferts Jan 23 '13 at 15:31

2 Answers2

5

Here's an R Base solution:

> lapply(split(dat, dat$id), function(x) coef(summary(glm(correct~phase,family="binomial",data=x))))
$`1`
                 Estimate Std. Error       z value  Pr(>|z|)
(Intercept) -6.931472e-01   1.224745 -5.659524e-01 0.5714261
phaserev    -3.845925e-16   1.732050 -2.220446e-16 1.0000000

$`2`
                Estimate Std. Error      z value Pr(>|z|)
(Intercept) 3.356998e-16   1.000000 3.356998e-16 1.000000
phaserev    1.098612e+00   1.527524 7.192109e-01 0.472011
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
4

you currently trying to do a glm for each row in of id:

I think you want a glm for each id seperately. Personally, I would go with something like:

library(plyr)

ddply(dat, .(id), function (x){
intercept <- coef(summary(glm(correct~phase,family="binomial",data=x)))[1]
slope     <- coef(summary(glm(correct~phase,family="binomial",data=x)))[2]
c(intercept,slope)                 
                              })

# id         V1            V2
#1  1 -0.6931472  1.386294e+00
#2  2  1.0986123 -6.345448e-16

# here V1 is intercept and V2 is the estimate
user1317221_G
  • 15,087
  • 3
  • 52
  • 78