0

I have some data which is mostly user demographics. There are lot of survey questions which people have answered "yes" or "no". But the data naturally contains lots of missing values. I don't want to impute the missing values. I want to treat that as a third category. So each question has three possible answers - "Yes", "No" and "NotSure".

What I am doing till now is :

model = graphlab.boosted_trees_classifier.create(train,
validation_set=None, target = target, max_iterations = 80, verbose = False)

where target is what I am predicting (It is binary 1 or -1). Now both my train and test dataset has lot of missing values so for that what I was doing till now is:

predictions = model.predict(test, missing_value_action='impute')

But these predictions are not giving me good accuracy. I want to convert each two category answer (Yes/No) to three category (Yes/No/NotSure). How to go about doing that?

I tried :

colNames = train.column_names()
for i in colNames[6:]:
    train.fillna(i,'NotSure')

This executes without any error but it doesn't work.

Karup
  • 2,024
  • 3
  • 22
  • 48

0 Answers0