0

I am looking to model the binary probability of someone to enroll/not enroll in an energy saving program. My data and formula look like this:

ID  Buildingtype Energyconsumption Enrollment   Zip Code
1   Detached        2000               1          1111
2   Detached        2200               0          2222
3   Semi Detached   1700               0          2299
4   Detached        1500               1          3902

glm.fit <- glm(Enrollment ~ Buildingtype + Energyconsumption, data = df, family = "binomial")

Since the dataset is big and I have over 300 zip codes, how can I add this variable into the formula so I can account for observed and unobserved locational characteristics from the areas?

TvCasteren
  • 449
  • 3
  • 18

1 Answers1

0

If the variable ZipCode is already a factor, you can just add it to your regression model like any other variable. Otherwise, you can define it as a factor inside the regression call:

glm.fit <- glm(Enrollment ~ Buildingtype + Energyconsumption + as.factor(ZipCode), data = df, family = "binomial")
Timon
  • 123
  • 1
  • 1
  • 6