I am trying to do a classification problem. Right now I have a training data set. All data are in these 3 levels: low, medium or high. Now I want to use glmnet package in R to develop a model to classify data into low, medium or high based on some features in the set. Then after that I need to use this model to classify a testing data set into these 3 levels. The problem is, I need to get percentages of low, medium or high for each data in the testing data set. For example, for a data in the testing set, I need to output a file that shows the percentage of this data is classified as low is 0.78, medium 0.20, high 0.02. What am I supposed to do to get this? Should I develop a model within each level?
Asked
Active
Viewed 122 times
0
-
So you just need to know the predicted labels in your testing data? If you had sample data on here I could run it exactly for you, but you can just run: prop.table() on you results and that will give you proportions – jwells Mar 26 '17 at 16:21
-
Thanks for the reply. However, the dataset I work with has too many features and it is hard to display here. – Sanguis Mar 26 '17 at 20:58
-
I understand. But does that function give you what you were looking for? Usually, the results would come in a vector with the levels of your predicted factor. prop.table() gives you the proportions of the vector – jwells Mar 26 '17 at 21:18
-
The predict does output a vector of number between 0 and 1. I use cv.glmnet() and set type.measure = "auc" and inside predict() function I set type = "reponse". I am not sure if the output is percentage. Also, btw, I code levels "low", "medium", "high" into binary numbers. If a data falls in "low", lets say, under the low feature it will show 1, and 0 for other 2 levels. I used cv.glmnet for each level (treat level vector of 0,1 as y)and develop 3 models and used predict for each model – Sanguis Mar 26 '17 at 21:40