Classification report in scikit learn

Question

I want to classify faults and no-fault conditions for a device. Label A for fault and label B for no-fault.

scikit-learn gives me a report for classification matrix as :

        precision    recall   f1-score   support
A       0.82         0.18     0.30       2565
B       0.96         1.00     0.98       45100

Now which of A or B results should I use to specify the model operation?

A. The data set is heavily imbalance, as such only the Precision,Recall and f1-score of A is giving informations about the model. — Frayal, Oct 17 '19 at 09:29

score 2 · Accepted Answer · answered Oct 17 '19 at 09:38

Introduction

There's no single score that can universally describe the model, all depends on what's your objective. In your case, you're dealing with fault detection, so you're interested in finding faults among much greater number of non-fault cases. Same logic applies to e.g. population and finding individuals carrying a pathogen.

In such cases, it's typically very important to have high recall (also known as sensitivity) for "fault" cases (or that e.g. you might be ill). In such screening it's typically fine to diagnose as "fault", something that actually works fine - that is your false positive. Why? Because the cost of missing a faulty part in an engine or a tumor is much greater than asking engineer or doctor to verify the case.

Solution

Assuming that this assumption (recall for faults is most important metric) holds in your case, then you should be looking at recall for Label A (faults). By these standards, your model is doing rather poorly: it finds only 18% of faults. Likely much stems from the fact that number of faults is ~20x smaller than non-faults, introducing heavy bias (that needs to be tackled).

I can think of number of scenarios where this score would not be actually bad. If you can detect 18% of all faults in engine (on top of other systems) and not introduce false alarms, then it can be really useful - you don't want too often fire alarm to the driver while everything's fine. At the same time, likely you don't want to use the same logic for e.g. cancer detection and tell patient "everything's OK", while there's a very high risk that the diagnosis is wrong.

Metrics

For the sake of completeness, I will explain the terms. Consider these definitions:

tp - true positive (real fault)
tn - true negative (it's not a fault)
fp - false positive (detected fault, while it's OK)
fn - false negative (detected OK, while it's a fault)

Here is one article that attempts to nicely explain what's precision, recall and F1.

thank you for the complete answer. as you mentioned just 5% of my data is related to fault cases (as you said ~20x), in reality, the ratios are such. do you have any suggestions for improving the results? (i have similar problems for the model that I used: svm,KNN, and DT.) thank you again. — Ali Ok, Oct 17 '19 at 13:13
@AliOk I am glad it helped. Indeed it's a typical scenario that you have a very heavy class imbalance. There's ton of research and different approaches to the problem, but they're offtopic to your question. If I answered your question, accept / upvote the answer and then ask a new one, preferably rather specific. To get you started, looke e.g. at techniques like SMOTE and its weighted mod (WSMOTE). Simply choosing a different algo won't just solve your problem. Comments have too few chars for complete answers. — Lukasz Tracewski, Oct 17 '19 at 14:16

Classification report in scikit learn

1 Answers1

Introduction

Solution

Metrics