0

I have a binary classification problem, ran it through DAI, and am given test AUC's. Where do I find the probability threshold that is used during deployment to score new rows of data?

An example would be a threshold of .50; I.E a target variable >.50 gets a 1, and target<.50 gets a 0 (or vice versa) during a decision. I need the exact threshold beyond the 4 digit concatenated number that is shown in the GUI as you move across the AUC curve. In the pictures below I've matched the thresholds and can't get the same confusion matrices with the identical threshold. Notice the False Positives are very minimally different.

DAI Threshold @ .0301 Threshold

Sklearn Confusion Matrix @ .0301 Threshold

UPDATED ANSWER: Download the "Experiments Summary" tab after a completed experiment on DAI. Within the zip file you'll find an ensemble_roc_test.json that gives thresholds up to 10 digits.

kevin_theinfinityfund
  • 1,631
  • 17
  • 18
  • Please try searching the experiment logs for lines that say: 'ROC metrics on test data' or 'ROC metrics on validation' depending on whether you supplied a test set. There should be a large table of probability thresholds and their corresponding scores. – Joe May 15 '19 at 21:39
  • I did that and could find the results, the log file is very messy in a text editor. Any best practice for scanning this log or parsing it nicely? – kevin_theinfinityfund May 15 '19 at 22:02
  • On Windows I'd recommend installing and using Notepad++. – Joe May 15 '19 at 22:47
  • I ended up using Atom (for anyone else who's interested) on Windows and it worked nicely as well. Thanks for the help. – kevin_theinfinityfund May 16 '19 at 13:46

2 Answers2

1

Driverless AI predictions return 'scores', so thresholding is not applied to the prediction and it is up to the user how they use the score. It is possible to see the recommended threshold to optimize different metrics on the experiment page ROC curve. For example, in the screenshot below the mouse is hovering over the 'Best F1' circle to get a summary that includes the threshold:

enter image description here

Joe
  • 268
  • 2
  • 5
  • I appreciate the response. I have a unique case where I need to know the entire threshold number beyond 4 digits to reproduce the confusion matrix manually. Where can I find that number explicitly within the log files/results of experiment? – kevin_theinfinityfund May 15 '19 at 19:32
0

UPDATED ANSWER: Download the "Experiments Summary" tab after a completed experiment on DAI. Within the zip file you'll find an ensemble_roc_test.json that gives thresholds up to 10 digits.

Experiment_Summary Thresholds

kevin_theinfinityfund
  • 1,631
  • 17
  • 18