Extracting failed columns from Deepchecks

Question

How can I extract from Deepcheck's custom suite result the failed checks + exact "problematic" columns. In my example I had 2 failed checks 'Feature Drift' & 'Multivariate Drift'. For both checks the "problematic" columns were 'col_1' , 'col_5' and 'col_30'.

How using Python I can get something like this :

Check 'Feature Drift' was failed because of : 'col_1' (Drift score = 0.9) , 'col_5'(Drift score = 0.7)
Check 'Multivariate Drift' was failed because of : 'col_1' and 'col_30' with domain_classifier_drift_score = 0.85

My code is :

columns_metadata = {'cat_features' : categorical_cols, 'label':y_label_col } 
train_dataset = Dataset(df = train_df   , **columns_metadata)
test_dataset  = Dataset(df = test_df    , **columns_metadata )

custom_drift_suite = Suite('My_custom_drift_suite',
                         FeatureDrift().add_condition_drift_score_less_than( max_allowed_categorical_score=0.2, max_allowed_numeric_score=0.2),
                        at a time (can’t be recognized with Feature Drift)
                         MultivariateDrift().add_condition_overall_drift_value_less_than(0.4), 
                         LabelDrift(),
                         NewCategoryTrainTest()train set ) 
                            
                        )
custom_suite_ans = custom_drift_suite.run(train_dataset = train_dataset, test_dataset = test_dataset)

Thanks :) Boris

matanper · Answer 1 · 2023-03-23T08:58:26.030

0

The suite result contains check results which contain conditions results. The following code will print the conditions results:

import deepchecks 

for check_result in custom_suite_ans.get_not_passed_checks():
    for condition in check_result.conditions_results:
        print(condition.name + ': ' + condition.details)
        # Access the result of the check - can get columns, etc
        print(check_result.value)

edited Mar 23 '23 at 08:58

answered Mar 21 '23 at 17:24

matanper

881
8
24

check_result.value has different format per check . I.e FeatureDrift() & MultivariateDrift() tests returns different formats. is there a way to write function that will return problematic column per each check ? without the need to have extraction function per each check type. – Boris Mar 22 '23 at 11:26
And for failed check I assume that 'custom_suite_ans.get_not_passed_checks()' cab be used. The question how to filter from there only the failed columns and their score – Boris Mar 22 '23 at 12:30
1

yes you can use `custom_suite_ans.get_not_passed_checks`, simplified my answer. About the format, there isn't an easier way currently, the solution in my opinion would be to have a standardized format. Please open an issue in deepchecks' GitHub describing your need – matanper Mar 23 '23 at 09:01
Thanks ! created new issue in Git as you suggested : https://github.com/deepchecks/deepchecks/issues/2416 – Boris Mar 23 '23 at 11:35

Extracting failed columns from Deepchecks

1 Answers1