2

I'm getting a subscript out of bounds error when using tidy() on a svyglm where one of the covariates does not vary:

library(broom)
library(tidyverse)
library(survey)

test.df <- tibble(
  id = c(1,2,3,4),
  weights = c(1,1,1,1),
  outcome = c(0,0,1,1),
  var1 = c(1,1,1,1),
  var2 = c(5,8,10,6)
)

svyglm(formula = outcome ~ var1 + var2,
       design = svydesign(ids = ~id, weights = ~weights, data = test.df),
       family = "binomial") %>%
  tidy(exp = TRUE,
       conf.int = TRUE)
#> Error in CI[piv, , drop = FALSE]: subscript out of bounds

The error seems to be sensitive to the order in which the covariates are specified in the formula, which was surprising to me. If the order is reversed, the error does not occur:

svyglm(formula = outcome ~ var2 + var1,
       design = svydesign(ids = ~id, weights = ~weights, data = test.df),
       family = "binomial") %>%
  tidy(exp = TRUE,
       conf.int = TRUE)
#> # A tibble: 2 x 7
#>   term        estimate std.error statistic p.value  conf.low conf.high
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>     <dbl>     <dbl>
#> 1 (Intercept)   0.0403     4.05     -0.792   0.511 0.0000144    113.  
#> 2 var2          1.56       0.508     0.875   0.474 0.576          4.22

The error only occurs when requesting confidence intervals:

svyglm(formula = outcome ~ var1 + var2,
       design = svydesign(ids = ~id, weights = ~weights, data = test.df),
       family = "binomial") %>%
  tidy(exp = TRUE)
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   0.0403     4.05     -0.792   0.511
#> 2 var2          1.56       0.508     0.875   0.474

Might this be an issue with broom, or am I missing something? Why would tidy() be sensitive to the order of covariates in the formula?

lost
  • 1,483
  • 1
  • 11
  • 19
  • One of your covariates is a vector of all the same numbers (`var1`). That's what's causing the issue. If you change even one value in `var1`, then the order doesn't matter, `svyglm` will return confidence intervals. – andrew_reece Feb 04 '20 at 05:00
  • @andrew_reece I understand that a term cannot be fit when it does not vary. `svyglm` handles this gracefully by simply dropping the term. Note that the `tidy()` error does not occur when `tidy()` is used on an otherwise identical `glm` model with these same data. What's at issue is why `tidy()` is failing ungracefully depending on the order of terms, and only on a `svyglm` object. – lost Feb 04 '20 at 05:04
  • This is a better formulation of your question, and I would recommend updating your answer to make it more clear this is what you're after. My guess is that `tidy` has assumptions about what a `glm` object will look like, and `svyglm` alters the `glm` object (e.g. by gracefully dropping terms) to the point where it's violating `tidy`'s expectations. A quick `str()` comparison of `glm` vs `svyglm` fit objects shows that `svyglm` not only rearranges the order of attributes, but it renames some (like `data -> variables`). this or something like it is almost surely causing problems for `tidy`. – andrew_reece Feb 04 '20 at 05:44

0 Answers0