0

I've been mucking around for about a week now, trying to figure this one out, so any help would be greatly appreciated.

I've got a data set with a binary target, and continuous predictors.

The input looks like this (with more variables, but you get the idea - it's pretty sparse):

18.425           0             0             0             0
0.000            0             0             0             0
0.000            0             0             0             0
0.000            0             0             3.234         0
0.000            0             0             0             0

The target is binary, 0 or 1, and also quite sparse:

0 1 0 0 0

I'm trying the following code:

ridge_fit <- glmnet(x = as.matrix(train_input), 
                y = as.factor(train_target),
                family="binomial")
ridge_predict <- predict.glmnet(ridge_fit, 
                            newx = test_input, 
                            type = 'class')

And getting output like this:

s0        s1        s2        s3        s4
-3.391069 -3.396630 -3.400896 -3.404444 -3.407538
-3.391069 -3.388934 -3.388549 -3.388796 -3.389314
-3.391069 -3.396621 -3.400882 -3.404427 -3.407517
-3.391069 -3.396630 -3.400896 -3.404444 -3.407538
-3.391069 -3.396630 -3.400896 -3.404444 -3.407538

I've tried playing around with the family in fitting, the type in predicting, run things as factor, as matrix, played around with different alpha values (aiming for ridge, but willing to try anything that works at this point) and different lambda sequences, tried some smaller data sets (then I'd get entire variables that were null values, and some errors cropped up).

Super, super confused about what else I can try. The data set works fine for regression, but keep spitting out regression-ish values when I'm trying it with a classification variable.

No idea what to do next . . . thanks in advance for any feedback!

1 Answers1

2

There are several things here:

  1. use predict S3 generic instead of predict.glmnet, because class(ridge_fit) = c("lognet" "glmnet"). So predict() will first pick predict.lognet. If you need probabilities, use type = 'response'.
  2. You got answer as matrix. Each column corresponds to particular lambda value. You can get lambda values from ridge_fit object.
  3. If you need single prediction, consider to use cv.glmnet() function to pick optimal lambda, based on cross-validation.
Dmitriy Selivanov
  • 4,545
  • 1
  • 22
  • 38