0

I wrote a simple Naive Bayes word classifier. In simple term what it does is

...
train( "some text A ...", "categoryA" );
train( "some text A ...", "categoryA" );
train( "some text B ...", "categoryB" );
train( "some text B ...", "categoryB" );
...
myclass category = GetCategory( "some new text" );
EXPECT_EQ( "categoryA"|"categoryB", category.Id);
EXPECT_EQ( xyz%, category.Percent);
...

While this will work in practice I was wondering if there was another, better, way of unit testing the classification of the document.

Would 3, 4 or ... categories make the test more reliable?

What would be a good suit of tests to test my function?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
FFMG
  • 1,208
  • 1
  • 10
  • 24
  • You can't write unit test to machine learning algorithms! Are you actually trying to evaluate your classification model? – eliasah Sep 20 '15 at 09:38
  • 1
    Of course you can write tests, (especially unit tests), the small example I gave is a valid test. I was just asking if the simple test I gave was good enough to test the algorithm given known data or if there are more robust ways. – FFMG Sep 20 '15 at 10:00
  • In computer programming, unit testing is a software testing method by which individual units of source code, sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures, are tested to determine whether they are fit for use. For machine learning, you do test harnessing or model evaluation! I suggest that you read [this](https://www.cs.upc.edu/~marias/papers/seke07.pdf) paper concerning a software testing approach to machine learning applications. – eliasah Sep 20 '15 at 10:05
  • "... method by which individual units of source code, sets of one or more computer program modules together with associated control data ...", that's exactly the example I gave. But please lets move on from that now this is not helping my question, you can ask a new question if you need help about unit testing. – FFMG Sep 20 '15 at 10:16
  • I don't need to ask any question about unit testing. I'm just telling you that it's useless to unit test a machine learning algorithm. You need to evaluate it! But whatever. – eliasah Sep 20 '15 at 10:39

1 Answers1

1

Design your test so that it is on the borderline of the decision function.

If the solution is too obvious, you will not notice a regression.

I suggest testing single examples, but with manually checked likelihoods for each class, not only the final class decision. You want to see that your method computed the correct probability.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thanks, to make sure I understand you are saying that I should have results very close to each others. Have my dataset cause category A and B be within a very small range, (Category A is 80% probability and Category B is 85% for example). – FFMG Sep 20 '15 at 17:59
  • Yes, always make tests that are *easy to fail*. – Has QUIT--Anony-Mousse Sep 20 '15 at 18:29