2

I have data with an ordered factor response ("low", "medium", "high") and a large number of predictor variables. I'm pursuing an ordinal regularized regression model using the ordinalNet package.

This is the first time I've used this package, so I tested it on simulated data. However, the coefficients have the opposite sign that I'd expect.

On this simulated data, I expect a positive coefficient for x. This can be seen from the plot (apparent higher probability of medium and then high for increasing values of x). A non-regularized method (MASS::polr) produces the positive coefficient I expect. However, the coefficient for x produced by ordinalNet is negative.

Appreciate any help!

library(tidyverse)
library(MASS)
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
library(ordinalNet)

# simulated data
dat <- 
  expand_grid(
    repeats = 1:10,
    id = 1:3
  ) %>%
  mutate(
    x = id + runif(length(id), -1, 1),
    x2 = runif(length(id), -1, 1),
    y = factor(c("low", "med", "high"), c("low", "med", "high"), ordered = TRUE)[id],
  )
  
# inspect
dat
#> # A tibble: 30 × 5
#>    repeats    id     x     x2 y    
#>      <int> <int> <dbl>  <dbl> <ord>
#>  1       1     1 0.498  0.539 low  
#>  2       1     2 1.31  -0.406 med  
#>  3       1     3 2.30   0.774 high 
#>  4       2     1 0.163  0.408 low  
#>  5       2     2 1.99  -0.504 med  
#>  6       2     3 3.86   0.931 high 
#>  7       3     1 0.822 -0.799 low  
#>  8       3     2 1.96  -0.267 med  
#>  9       3     3 2.83   0.769 high 
#> 10       4     1 0.127  0.139 low  
#> # … with 20 more rows

# confirm that high values of x produce higher probability of medium and then high
dat %>%
  ggplot() + 
  aes(x, y) + 
  geom_point()


# with MASS::polr
# https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/
model <- polr(y ~ x + x2, data = dat)
coef(model)
#>         x        x2 
#> 3.6007872 0.6991214

# with ordinalNet
form <- y ~ x + x2
x <- model.matrix(form, dat)[, -1]
y <- model.frame(form, dat)[, 1, drop = TRUE]
rmodel <- ordinalNetCV(x, y)
#> Fitting ordinalNet on full training data
#> Fitting ordinalNet on fold 1 of 5 
#> Fitting ordinalNet on fold 2 of 5 
#> Fitting ordinalNet on fold 3 of 5 
#> Fitting ordinalNet on fold 4 of 5 
#> Fitting ordinalNet on fold 5 of 5 
#> Done
coef(rmodel$fit)
#> (Intercept):1 (Intercept):2             x            x2 
#>     4.2554900     8.4649827    -3.4500300    -0.6367011
# why do coefficients have the wrong sign?

Created on 2023-03-20 with reprex v2.0.2

Arthur
  • 1,248
  • 8
  • 14

1 Answers1

1

reverse Logical. If TRUE, then the “backward” form of the model is fit, i.e., the model is defined with response categories in reverse order. For example, the reverse cumulative model with K + 1 response categories applies the link function to the cumulative probabilities P(Y ≥ 2), . . . , P(Y ≥ K + 1), rather then P(Y ≤ 1), . . . , P(Y ≤ K).

Dear Arthur I've came from the exact problem as you, i don't know if my answer is correct or not.When we fit a cumulative model, we should really be careful to the cumulative direction we use the link function. In this case you can add argument reverse=TRUE in the formula.

The website below may helps you interpret the meaning of coefficients more https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/