-1

New to stackoverflow. I'm working on a project with NHIS data, but I cannot get the svyglm function to work even for a simple, unadjusted logistic regression with a binary predictor and binary outcome variable (ultimately I'd like to use multiple categorical predictors, but one step at a time).

El_under_glm<-svyglm(ElUnder~SO2, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)

Error in eval(extras, data, env) : object '.survey.prob.weights' not found

I changed the variables to 0 and 1 instead:

Under_narm$SO2REG<-ifelse(Under_narm$SO2=="Heterosexual", 0, 1) Under_narm$ElUnderREG<-ifelse(Under_narm$ElUnder=="No", 0, 1)

But then get a different issue:

El_under_glm<-svyglm(ElUnderREG~SO2REG, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)

Error in svyglm.survey.design(ElUnderREG ~ SO2REG, design = SAMPdesign, : all variables must be in design= argument

This is the design I'm using to account for the weights -- I'm pretty sure it's correct:

SAMPdesign=svydesign(data=Under_narm, id= ~NHISPID, weight= ~SAMPWEIGHT)

Any and all assistance appreciated! I've got a good grasp of stats but am a slow coder. Let me know if I can provide any other information.

  • is this the cdc's national health interview survey from ipums? i'm confused why your `svydesign()` line doesn't match ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2019/srvydesc-508.pdf#page=33 ? sorry if i'm overlooking something.. – Anthony Damico Mar 08 '21 at 15:56
  • @AnthonyDamico you're absolutely correct -- I was attempting to use a subset, but forgot that the documentation for R you linked has me subsetting differently (which should help me avoid the rescaling issue altogether). Thank you and apologies! – Olivia Sullivan Mar 08 '21 at 18:49
  • This same error can be reproduced if one or more variables you placed in your model is not in your dataset. Make sure all variables in your model are in the dataset you are using. – Alien Jan 07 '22 at 23:42

1 Answers1

0

Using some make-believe sample data I was able to get your model to run by setting rescale = TRUE. The documentation states

Rescaling of weights, to improve numerical stability. The default rescales weights to sum to the sample size. Use FALSE to not rescale weights.

So, one solution maybe is just to set rescale = TRUE.

library(survey)
  # sample data
  Under_narm <- data.frame(SO2 = factor(rep(1:2, 1000)),
                           ElUnder = sample(0:1, 1000, replace = TRUE),
                           NHISPID = paste0("id", 1:1000),
                           SAMPWEIGHT = sample(c(0.5, 2), 1000, replace = TRUE))
                           
  # with 'rescale' = TRUE
  SAMPdesign=svydesign(ids = ~NHISPID,
                       data=Under_narm,
                       weights = ~SAMPWEIGHT)
 
  El_under_glm<-svyglm(formula = ElUnder~SO2, 
                       design=SAMPdesign,
                       family=quasibinomial(), # this family avoids warnings
                       rescale=TRUE) # Weights rescaled to the sum of the sample size.
  
  summary(El_under_glm, correlation = TRUE) # use correlation with summary()
  

Otherwise, looking code for this function's method with 'survey:::svyglm.survey.design', it seems like there may be a bug. I could be wrong, but by my read when 'rescale' is FALSE, .survey.prob.weights does not appear to get assigned a value.

    if (is.null(g$weights)) 
      g$weights <- quote(.survey.prob.weights)
    else g$weights <- bquote(.survey.prob.weights * .(g$weights)) # bug?
    g$data <- quote(data)
    g[[1]] <- quote(glm)
    if (rescale) 
      data$.survey.prob.weights <- (1/design$prob)/mean(1/design$prob)

There may be a work around if you assign a vector of numeric values to .survey.prob.weights in the global environment. No idea what these values should be, but your error goes away if you do something like the following. (.survey.prob.weights needs to be double the length of the data.)

SAMPdesign=svydesign(ids = ~NHISPID,
                     data=Under_narm,
                     weights = ~SAMPWEIGHT)

.survey.prob.weights <- rep(1, 2000)

El_under_glm<-svyglm(formula = ElUnder~SO2, 
                     design=SAMPdesign,
                     family=quasibinomial(), 
                     rescale=FALSE)

summary(El_under_glm, correlation = TRUE)
xilliam
  • 2,074
  • 2
  • 15
  • 27
  • Thanks a ton! I re-ran the SAMPdesign line, changed to quasibinomial, and put rescale=TRUE, and that combination seemed to work. I'll probably have to fiddle around with .survey.prob.weights since I think rescaling affects the interpretability of the results, but this is a huge win this morning. Thanks again! – Olivia Sullivan Mar 04 '21 at 17:01
  • Rescaling does not affect the interpretability of the results; it's purely a computational issue. – Thomas Lumley Mar 18 '21 at 22:51