I have some data which is mostly user demographics. There are lot of survey questions which people have answered "yes" or "no". But the data naturally contains lots of missing values. I don't want to impute the missing values. I want to treat that as a third category. So each question has three possible answers - "Yes", "No" and "NotSure".
What I am doing till now is :
model = graphlab.boosted_trees_classifier.create(train,
validation_set=None, target = target, max_iterations = 80, verbose = False)
where target
is what I am predicting (It is binary 1 or -1). Now both my train
and test
dataset has lot of missing values so for that what I was doing till now is:
predictions = model.predict(test, missing_value_action='impute')
But these predictions are not giving me good accuracy. I want to convert each two category answer (Yes/No) to three category (Yes/No/NotSure). How to go about doing that?
I tried :
colNames = train.column_names()
for i in colNames[6:]:
train.fillna(i,'NotSure')
This executes without any error but it doesn't work.