-2

Is there an appropriate way to do R style logistic regression where I avoid using loops, but I can have multiple regressions for each level in a particular factor?

For example, assume df is 365 daily rows with a binary to say whether or not it rained:

multifactorglm(x){
  glm(rained ~ temp + humidity, data=x, family="binomial")
}
tapply(df, month, multifactorglm)

This won't run in R with the following message...

Error: unexpected '{' in "multifactorglm(x){"
>   glm(rained ~ temp + humidity, data=x, family="binomial")
Error in eval(predvars, data, env) : 
  numeric 'envir' arg not of length one
> }
Error: unexpected '}' in "}"
> 

I would like to have as a result a vector of 12 glm regressions, but I don't want to use a loop. What do I do?

JoeBass
  • 519
  • 7
  • 18
  • 2
    this looks more like a typo. Could you please provide a reproducible example? Also ... see `lme4::lmList` (which does GLMs as well as linear models) – Ben Bolker May 06 '15 at 16:33
  • Agree the error message suggests a typo. `tapply` is a disguised loop. If there is no compelling reason, such as a homework problem specification, then just use the direct and equally efficient `for` loop passing the month value and using the subset argument to `glm`. I can think of an entirely loopless method using 'month' as an interaction with the other variables but interpreting the output will seem a tad Baroque. If you want a vector or matrix, you will need to extract the coefficients because the result of `glm` is a list. – IRTFM May 06 '15 at 16:36

1 Answers1

3

I think the error is caused by failing to understand R's syntax to define a function (and a further error in not knowing that column names such as "month" are not available as global variables. Try instead:

multifactorglm <- function(x){
  glm(rained ~ temp + humidity, data=x, family="binomial")
}
do.call(rbind, do(df, df$month, multifactorglm) )

If you really wanted an entirely numeric result it might be:

multifactorglm<- function(x){
  coef( glm(rained ~ temp + humidity, data=x, family="binomial") )
}
do.call( rbind, do(df, df$month, multifactorglm) )

...Which I think will be a matrix with 3 columns (Intercept and two parameter columns) ... although it's untested in the absence of data. Looking at my first effort I realized tht tapply will not split dataframes properly. You probably need to use either lapply (split(df, df$month) ,multifactorglm) or the do function which internally uses tapply on the rownames.

IRTFM
  • 258,963
  • 21
  • 364
  • 487