I have data with an ordered factor response ("low", "medium", "high") and a large number of predictor variables. I'm pursuing an ordinal regularized regression model using the ordinalNet
package.
This is the first time I've used this package, so I tested it on simulated data. However, the coefficients have the opposite sign that I'd expect.
On this simulated data, I expect a positive coefficient for x
. This can be seen from the plot (apparent higher probability of medium and then high for increasing values of x
). A non-regularized method (MASS::polr
) produces the positive coefficient I expect. However, the coefficient for x
produced by ordinalNet
is negative.
Appreciate any help!
library(tidyverse)
library(MASS)
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#>
#> select
library(ordinalNet)
# simulated data
dat <-
expand_grid(
repeats = 1:10,
id = 1:3
) %>%
mutate(
x = id + runif(length(id), -1, 1),
x2 = runif(length(id), -1, 1),
y = factor(c("low", "med", "high"), c("low", "med", "high"), ordered = TRUE)[id],
)
# inspect
dat
#> # A tibble: 30 × 5
#> repeats id x x2 y
#> <int> <int> <dbl> <dbl> <ord>
#> 1 1 1 0.498 0.539 low
#> 2 1 2 1.31 -0.406 med
#> 3 1 3 2.30 0.774 high
#> 4 2 1 0.163 0.408 low
#> 5 2 2 1.99 -0.504 med
#> 6 2 3 3.86 0.931 high
#> 7 3 1 0.822 -0.799 low
#> 8 3 2 1.96 -0.267 med
#> 9 3 3 2.83 0.769 high
#> 10 4 1 0.127 0.139 low
#> # … with 20 more rows
# confirm that high values of x produce higher probability of medium and then high
dat %>%
ggplot() +
aes(x, y) +
geom_point()
# with MASS::polr
# https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/
model <- polr(y ~ x + x2, data = dat)
coef(model)
#> x x2
#> 3.6007872 0.6991214
# with ordinalNet
form <- y ~ x + x2
x <- model.matrix(form, dat)[, -1]
y <- model.frame(form, dat)[, 1, drop = TRUE]
rmodel <- ordinalNetCV(x, y)
#> Fitting ordinalNet on full training data
#> Fitting ordinalNet on fold 1 of 5
#> Fitting ordinalNet on fold 2 of 5
#> Fitting ordinalNet on fold 3 of 5
#> Fitting ordinalNet on fold 4 of 5
#> Fitting ordinalNet on fold 5 of 5
#> Done
coef(rmodel$fit)
#> (Intercept):1 (Intercept):2 x x2
#> 4.2554900 8.4649827 -3.4500300 -0.6367011
# why do coefficients have the wrong sign?
Created on 2023-03-20 with reprex v2.0.2