0

I have a quadratic term in a GLM and I am interested in the vertex value (+ the standard error and confidence interval of the vertex) of the quadratic term. To my knowledge, there is no automatic function for this purpose in R, and I am unable to calculate this manually, as I do not have sufficient knowledge of statistics and R. Is there anybody who could help and build the code to derive the values of interest?

GLM <- glm(Y ~ X1 + I(X1^2) + X2 + X3 + X4 + X5, family="binomial", data=DF)

Example of desired output with code from Stata

Kris
  • 21
  • 5
  • Are you confusing vertex (the minima of `f(x)=b + ax + cx^2`) with coefficient (a random variable/parameter to be estimated in a regression model)? – Baraliuh Aug 18 '22 at 19:55
  • @Baraliuh No, I am not. I know how to get the coefficient of the regression (```summary(GLM)``` or ```coef(GLM)```) but I would like to find the value of the vertex of the quadratic term, i.e. the x value of the point where the positive slope of the concave parabola turns to negative slope, plus of standard error (as we don’t know the value of the vertex with 100% accuracy). – Kris Aug 18 '22 at 20:02

1 Answers1

0

Warnings aside since I am not used to simulating data for these types of problems. Here is a vertex with CI based on a Bonferroni CI of the parameters. Since we need two parameters, we need to adjust the product of the two CIs such that x^2=1-alpha -> x=(1-alpha)^(1/2)

set.seed(1)
library(dplyr, warn.conflicts = F)
library(ggplot2, warn.conflicts = F)
logit_trans <- function(x) 1/(1/exp(x)+1)

sim_data <- tibble(
  x = rep(seq(0,20,.1), 10),
  y = round(logit_trans(20 - 8*x + .4*x^2 + rnorm(2010, 0, 1)))
)

model <- glm(
  y ~ x + I(x^2),
  binomial(),
  sim_data
)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

find_vertex <- function(model_pars){
  a <- model_pars[2]
  b <- model_pars[3]
  if(2*b>0){ #Check second deriv.
    -a/(2*b)
  } else{ # concave down
    NA
  }
}

ci_vertex <- function(model){
  mean_vertex <- model %>% 
    coef() %>% 
    find_vertex()
  ci <- model %>% 
    confint(parm = 2:3, level = (1-.05)^(1/2))
  lower_ci <- ci %>% 
    magrittr::extract(,1) %>% 
    c(0, .) %>% 
    find_vertex()
  upper_ci <- ci %>% 
    magrittr::extract(,2) %>% 
    c(0, .) %>% 
    find_vertex()
  list(
    lower_ci = lower_ci,
    mean_vertex = mean_vertex,
    upper_ci = upper_ci
  )
}

ci <- ci_vertex(model)
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

sim_data %>% 
  mutate(
    x2 = x^2,
    y_expected = logit_trans(c(matrix(c(rep(1, n()), x, x2), ncol = 3) %*% matrix(coef(model))))
  ) %>% 
  ggplot(aes(x, y_expected)) + 
  geom_point() +
  geom_line() +
  geom_vline(xintercept = ci$lower_ci, color = 'red', linetype = 'dashed') +
  geom_vline(xintercept = ci$upper_ci, color = 'red', linetype = 'dashed') +
  geom_vline(xintercept = ci$mean_vertex, color = 'red')

Created on 2022-08-19 by the reprex package (v2.0.1)

The mean vertex is almost exactly 10 as we would expect from the input function for y. For some reason, the upper and lower CI are inverted and the lower one is slightly larger than expected. I am guessing this is due to model unidentifiability.

Baraliuh
  • 2,009
  • 5
  • 11
  • Thanks a lot. I run your code with my dataset and I’ve got an error in the last step: `Error in mutate(): ! Problem while computing y_expected = logit_trans(c(matrix(c(rep(1, n()), x, x2), ncol = 3) %*% matrix(coef(model)))). Caused by error in matrix(c(rep(1, n()), x, x2), ncol = 3) %*% matrix(coef(model)): ! non-conformable arguments` The only thing that is different in my model compared to yours is that I have more predictors in the model (only one of them is quadratic, others are linear). Could this cause the issue? – Kris Aug 19 '22 at 19:39
  • That is an issue with matrix multiplication. Just use `coef(model)[1:3]` to get the relevant parameters (intercept and `x`'s in the polynomial). – Baraliuh Aug 20 '22 at 08:22
  • Now I’ve got a different error: `Error: Discrete value supplied to continuous scale` – Kris Aug 20 '22 at 17:59
  • On the same line? Are you using large `X` as in your model or small as in my code? – Baraliuh Aug 20 '22 at 18:35
  • Yes, on the same line. I renamed the predictor to small `x` before running your code. – Kris Aug 20 '22 at 18:46
  • Also, I don’t know whether this is important, but I have concave upward instead of down (sorry for not specifying this before, I am used to distinguishing them as concave and convex rather than concave upward and concave downward) – Kris Aug 20 '22 at 18:58
  • Ah, then you can just remove the `if` clause, but that should not be where the error is coming from. You could try to use the `predict` function instead of solving the matrix multiplication to get the model predictions. – Baraliuh Aug 20 '22 at 20:08
  • I removed the `if` clause and it finally doesn’t show any errors, but the result is definitely wrong. It calculates the mean vertex at -2.425529e-07 but I can visually see that it should be somewhere around 3,000. See plot here: [![Screenshot-2022-08-20-at-22-08-16.png](https://i.postimg.cc/7YXg74ck/Screenshot-2022-08-20-at-22-08-16.png)](https://postimg.cc/4m77kDsL) – Kris Aug 20 '22 at 21:11
  • Umn, that is very weird. The calculation is based on the derivative being equal to zero and then estimating the error based on the variance of the parameters. At least the mean estimate should be correct, I realized now that the boundaries might need to be adjusted to the four corners (min,min; min,max; ,max,min; max,max) of parameter CI's.. I assume you disregarded the other covariates (i.e, they are `0`)? – Baraliuh Aug 20 '22 at 22:28
  • Sorry I am not sure what you mean by this, can you clarify? – Kris Aug 21 '22 at 16:51