1

Gain/Lift Chart for training and validation data set

The above image is the H2O GBM classification model lift chart for training and validation data sets. I am confused it with the other lift charts I have seen. Normally the baseline will be 45 degrees and the lift curve used to be somewhat convex shape from the baseline curve. In the above figure if the green line shows the lift curve, why is it constant and coming down and touches the baseline? Also why the baseline is not 45 degree? Can anyone help me to interpret the model using the above graph? Is my model perform well?

Kitooos
  • 37
  • 7

1 Answers1

2

The black line is not the baseline, but the cumulative capture rate. The capture rate is the proportion of all the events that fall into the group/bin. E.g. if 90 out of total 100 positive outcomes/events fall into the first bin, then the capture rate for that bin is 0.9.

The green line is the cumulative lift curve, so by definition the two lines converge at 1.

Whether your model performs well or not depends on your goal. According to the validation metrics, you could capture about 80% of the events by targeting only 50% of the population, which means lift of about 1.6.

vaclav
  • 191
  • 3
  • How did u get this 50% and 80%? Is it the place where cumulative. capture rate touches at 1 or place starts converges? Can you please explain a bit more about "you could capture about 80% of the events by targeting only 50% of the population" – Kitooos Oct 08 '18 at 02:57
  • 1
    When you take the point of 0.5 on the x-axis on the validation (right) plot and look at the cumulative capture rate (black) curve, you see the value is about 0.8. This information is useful if you for instance want to prioritise based on your model who you will target by e.g. marketing campaign. If you can predict who would buy if targeted, you can prioritise. In this example if you target the first 50% of people with the highest probability of buying, you would capture 80% of the people who will buy, so you safe some money while capturing majority of the buyers. – vaclav Oct 09 '18 at 20:11
  • Can you explain if it is in a healthcare industry? For example, How can the graph give me an idea for predicting if the patient is having cancer or not? – Kitooos Oct 10 '18 at 22:02
  • 1
    Assume you have a new and very expensive cancer treatment method that works only for some patients. If you can predict in advance which patients it will work on, you can provite the treatment only to those patients where the likelihood they will respond to the treatment is the highest. In our example you could target the top 50% of the patients and in fact this way you would cure 80% out of all the patients the treatment would work for. Does it make sense? – vaclav Oct 10 '18 at 22:34
  • Your explanation is very clear. But what if there is no additional thing like a cancer treatment and it is just the prediction of the event 'if the patient is having cancer or not'. In that case, how can we utilize this graph? – Kitooos Oct 11 '18 at 21:33
  • Lift has a specific usage, typically for the cases where your model is compared against a random baseline in terms of the decision making (e.g. the targeting as I exemplified). If you wish to compare two models in terms of their precision/accuracy, you can also and perhaps better use ROC or Precision-Recall curves, see e.g. https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/ . – vaclav Oct 16 '18 at 12:09