0

I am working on a pilot with Amazon Web Service Machine Learning service and I have some soubts.

I have used a Binary Classifier model and, in my opinion, the histogram of the results obtained does not match the numerical results. According to the histogram, the distribution of False Positives is higher than the distribution of True Negatives but the numerical results do not present this behavior.

Histogram

  • 778 true positives
  • 15,178 true negatives
  • 6,663 false positives
  • 173 false negatives

Anyone can bring some insights into this matter?

Thank you,

2 Answers2

0

You have control on the cut-off score (the vertical line) and you can move it from right to left and vice versa. In your diagram, you moved the cut-off score way to the left, which means that you will predict Yes in most cases, and therefore, you will have many more false positives (wrongly predicted as positive (=Yes), than false negative.

Guy
  • 12,388
  • 3
  • 45
  • 67
  • Thank you Guy for your answer but the issue here is within the negatives observations histogram. As it can be seen in the attached image, taking into consideration a threshold of 0.02 and only the negative observations histogram, the area covered in the left-side of the threshold is considerably smaller than the area on the right. That does not match with the obtained results of: 15,178 true negatives (Area in the left of the threshold) 6,663 false positives (Area in the right of the threshold) – Joan Salvatella Aug 02 '16 at 10:53
  • You don't see all the true negatives (gray area on the top left side) as you probably have many zero or close to zero values. If you compare the false predictions (false negatives = 173 and false positives = 6,663), you can easily see the ratio of these striped area to the left and right of the cut-off line. – Guy Aug 02 '16 at 17:56
  • I don't think it makes any sense to not be able to see all the true negatives. If there are a lot of zero and close to zero values the histogram should have a very high number and that's it... wouldn't it? Unless the vertical axis scale is logarithmic! – Joan Salvatella Aug 29 '16 at 07:01
0

This is the answer to my question from the Amazon Web Services Support team through their forums:

After doing some digging around, I found that the Y-axis scaling is logarithmic for the histograms, which explains why a direct 1:1 area comparison of the true negatives and false positives would not be consistent with the numerical results. If we didn't display a logarithmic scale, my guess would be that most of your Y-axis would be dominated by the true negative and true positive results and the false positives and false negatives could be too small to noticeably see.

Reference: https://forums.aws.amazon.com/message.jspa?messageID=733706

If the Y-axis is logarithmic the results DO match with the provided histograms.