3

I'm currently using various classifiers in Weka.

My testing data is labelled, e.g.:

@relation bmwreponses

@attribute IncomeBracket {0,1,2,3,4,5,6,7}
@attribute FirstPurchase numeric
@attribute LastPurchase numeric
@attribute responded {1,0}

@data
4,200210,200601,0
5,200301,200601,1
6,200411,200601,0
5,199609,200603,0
6,200310,200512,1
...

The last value per row is the class element, i.e. responded.

But if I try unlabelled test data, e.g.:

@relation bmwreponses

@attribute IncomeBracket {0,1,2,3,4,5,6,7}
@attribute FirstPurchase numeric
@attribute LastPurchase numeric
@attribute responded {1,0}

@data
4,200210,200601,?
5,200301,200601,1
6,200411,200601,?
5,199609,200603,0
6,200310,200512,?
...

Weka will carry out the classification but ignore the unlabelled rows. So the test above will only include rows 1 and 3.

Does anyone know how to get around this? Should I have the class attribute declared in the test file or am I missing something?

Mr Morgan.

Mr Morgan
  • 2,215
  • 15
  • 48
  • 78
  • 1
    You need to declare the value you want to predict in the test file because otherwise it can't evaluate how well the model actually does -- you could get the predictions, but wouldn't be able to tell whether they are correct or not. – Lars Kotthoff Apr 08 '13 at 13:40
  • In the case of my data, I know the class of each element in the test data and it is provided for classifications. But I wonder about a 'naive' test dataset without the class for other Weka operations, clustering perhaps? – Mr Morgan Apr 08 '13 at 13:44
  • For clustering (and other unsupervised methods) you don't need the labels -- but that's a different task than classification. – Lars Kotthoff Apr 08 '13 at 14:40
  • I agree. But somehow I think I will need to handle this eventuality. Thanks Lars. – Mr Morgan Apr 08 '13 at 15:18

0 Answers0