0

I am working on a descriptive (NOT predictive) analysis whereby I wish to compare the magnitudes of the coefficients from a logistic regression type problem - including the intercept. As each variable needs to be described, I have tried standard glm logit regression, and knowing that many variables are at least partially correlated, I am also trying out ridge regression to see how it differs.

The issue I have is that all guides I've seen recommend identifying coefficients at lambda.min or lambda.1se, however for me, the coefficients at this value of lambda are all zeroes. I can arbitrarily select a lambda to return values, but I don't know that this is correct.

require(glmnet)

CT.base <- readRDS('CTBaseObj.rds') #readRDS data objects


regular <- glm(Flag ~ . - Occurrences , family = binomial(link="logit"), 
               data = CT.base, weights = Occurrences, maxit = 50)

#Ridge
x <- model.matrix(Flag ~ . - Occurrences, CT.base)
x <- x[, !colnames(x) %in% '(Intercept)']
y <- CT.base$Flag
w <- CT.base$Occurrences

CT.cv <- cv.glmnet(x, y , family = "binomial", 
                   weights = w, alpha = 0.0, parallel = T, type.measure = "class")
plot(CT.cv)

R plot of cv.glmnet results

#CT.reg <- coef(CT.cv, s=CT.cv$lambda.1se) # coefficients here are zero
CT.reg <- coef(CT.cv, s=-3) # Looks like an interesting value!?
CT.reg <- data.frame(name = CT.reg@Dimnames[[1]][CT.reg@i+1], coefficient = CT.reg@x)

I've linked the data set behind this for reproducibility (https://drive.google.com/open?id=1YMkY-WWtKSwRREqGPkSVfsURaImItEiO) but this may not be necessary! Any advice gladly received.

Thanks.

Jon
  • 445
  • 3
  • 15
  • I think the absurd weights you imposed on the subjects result in such a behavior. Having one weight 20000000 and many others 1 clearly messes with the model. Only the observation with the highest weight is given a chance to be predicted so all coef can shrink to zero. – missuse Feb 10 '18 at 11:58

1 Answers1

1

The problem with you model is the very big imbalance in the weights you impose on the observations, where one of the weights is 20000000, while many are 1, and none passes 10000.

par(mfrow = c(1,2))
boxplot(w)
boxplot(log(w))

enter image description here

In such a scenario the model can do no much but always predict the observation with the immense weight and shrink all the coefficients to zero. You can see this by:

CT <- glmnet(x, y , family = "binomial", 
                   weights = w, alpha = 0)

all(predict(CT, x, CT.cv$lambda.min, type= "class") == 0)
#TRUE

y[which.max(w)]
#0

I am uncertain what is the context of these weights consider fitting a model without them.

CT.cv <- cv.glmnet(x, y , family = "binomial", 
                   alpha = 0, type.measure = "class")

Another problem is the model without the weights behaves much the same way as the model with absurd weights by shrinking the coef to zero and predicting only the class 0. This is probably caused by class imbalances:

table(y)
y
  0   1 
474  75 

And the fact the model minimizes miss classification when it predicts only the more abundant class. This can be accounted for by increasing the weights associated with class 1 members. Or by picking another metric such as balanced accuracy, balanced error rate or area under the precision-recall curve for model fitting. Unfortunately glment package does not offer this option. But you might look at package caret or mlr.

If you use auc:

CT.cv <- cv.glmnet(x, y , family = "binomial", 
                   alpha = 0, type.measure = "auc")
plot(CT.cv)

enter image description here

unfortunately this yields the same pattern:

CT <- glmnet(x, y , family = "binomial", 
                    alpha = 0, lambda = CT.cv$lambda.min)

all(predict(CT, x, CT.cv$lambda.min, type= "class") == 0)

I will show you how to train using balanced accuracy and library mlr:

library(mlr)

make a learner:

lrn <- makeLearner("classif.glmnet", predict.type = 'prob', alpha = 0)

get all tune-able parameters:

getParamSet(lrn)

create a train data set:

mlr_train <- data.frame(x,
                        y = as.factor(y))

create a tune task

task <- makeClassifTask(data = mlr_train, target = "y",  positive = "1")

tune control will be a grid search of 200 values explored with probability threshold tuning:

ctrl = makeTuneControlGrid(resolution = 200, tune.threshold = TRUE)

we will tune only lambda in the range 0 - 10:

ps <- makeParamSet(
  makeNumericParam("lambda", lower = 0, upper = 10))

cv10 denotes 10 fold CV
bac is balanced accuracy it wil be used as selection metric

z <- tuneParams(lrn, task, cv10, par.set = ps, control = ctrl, 
          show.info = TRUE, measures = list(bac, setAggregation(bac, test.sd)))

> z
Tune result:
Op. pars: lambda=0.0503
Threshold: 0.07
bac.test.mean=0.5615910,bac.test.sd=0.0394965

so in order to maximize balanced accuracy you need to chose lambda = 0.0503 and a threshold of 0.07. Given this threshold I would give up one this approach and go back to tuning the weights of the positive class.

missuse
  • 19,056
  • 3
  • 25
  • 47
  • Thank you for your extensive answer - appreciate the time taken.The absurd weights are actually real life findings, and we introduced them to the logit model fairly early on without a huge shift in coefficient estimates - it's a population weighting. I appreciate ridge may be different in how it approaches this, and I may have to revert back to standard logit as it really is the descriptive coefficients rather than predictive power I need. – Jon Feb 12 '18 at 08:37
  • @Jon Perhaps you are interested just in the probabilities and how the coefficients affect them. To me it does not look like ridge could provide useful insights in this case given the very modest balanced accuracy at a very low probability threshold but I might be wrong since I have never actually explored it in such a way. – missuse Feb 12 '18 at 09:14