I have a model predicted using logistic regression using cv.glm on a training dataset and when I predict it on testdata
and try to generate a confusion matrix it is throwing error.The classes of both train and testdata
set are unbalanced.
Here are the dimensions of both test and train datasets. Both my traindata
and testdata
is from a big dataset of 1234 columns and 60 rows I split it randomly into two sets to check the sensitivity and specificity of classfication at the end.
> dim(traindata)
40 1234
> dim(testdata)
[1] 20 1234
And this is what I tried.
Subtype = factor(traindata$Subtype)
CV=cv.glmnet(x=data.matrix(traindata),y=Subtype,standardize=TRUE,alpha=0,nfolds=3,family="multinomial")
response_predict=predict(CV, data.matrix(testdata),type="response")
predicted = as.factor(names(response_predict)[1:3][apply(response_predict[1:3], 1, which.max)])
Here it throws error as:
Error in apply(response_predict[1:3], 1, which.max) :
dim(X) must have a positive length
My question is with to proceed in such unbalanced dataset using cv.glm
and how to get rid of the above mentioned error.
Thank you