I am analyzing the Secom dataset from the UCI Machine Learning repository by lasso-regularized logistic regression, but the results are bad.
https://archive.ics.uci.edu/ml/datasets/SECOM
Characteristics:
- 1546 data samples with 590 numeric attributes
- 106 positive samples (production failures)
The goal is to predict the positive class accurately, and also to perform feature selection.
I optimize lambda by 10 fold cross validation with the glmnet package in R. But the results are terrible, as the model tends to assign all test samples to only a single class.
Is it simply the wrong kind of model for this dataset?