0

I am solving one document classification problem with Python Scikit learn. I have used CountVectorizer to get word counts from the text documents. And used MultinomialNB classifier for class predictions. My model is giving 94.5% accuracy. I am still trying to improve the classification accuracy. In that process I tried to print confusion matrix and it looks as shown below. Clearly there are 32 False Positives (Sum of all the elements above the diagonal - 11+2+3+4+1+1+10 = 32). And I am blocked here. Can I proceed further and improve the classification accuracy from here? If Yes, please guide me through the steps.

      |   1   2   3   4   5   6   7   8 |
    --+---------------------------------+
    1 |<561> 11   .   .   .   2   .   . |
    2 |   7<313>  3   .   .   .   .   . |
    3 |   .   . <41>  .   .   .   .   4 |
    4 |   .   1   . <15>  .   1   .   . |
    5 |   .   .   1   .  <4>  .   .   . |
    6 |   .   .   .   .   . <45>  .   1 |
    7 |   .   .   1   .   .   . <36> 10 |
    8 |   .   .   1   .   .   .   3 <36>|
    --+---------------------------------+


        Thanks!
Rizwan
  • 1
  • Have you tried other classifiers? Usually ensemble classifiers do well with multi class problems. You can try RandomForestClassifier. – AndreyF Feb 19 '17 at 06:06
  • Thanks for the comment.I have very limited knowledge in ensemble learning. I will go through few tutorials and try RandomForestClassifier and get back to you on my findings. Thanks! – Rizwan Feb 20 '17 at 14:36

0 Answers0