Results not reproducible between runs despite seeds being set

Question

How is it possible, that running the same Python program twice with the exact same seeds and static data input produces different results? Calling the below function in a Jupyter Notebook yields the same results, however, when I restart the kernel, the results are different. The same applies when I run the code from the command line as a Python script. Is there anything else people do to make sure their code is reproducible? All resources I found talk about setting seeds. The randomness is introduced by ShapRFECV.

This code runs on a CPU only.

MWE (In this code I generate a dataset and eliminate features using ShapRFECV, if that's important):

import os, random
import numpy as np
import pandas as pd
from probatus.feature_elimination import ShapRFECV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

global_seed = 1234
os.environ['PYTHONHASHSEED'] = str(global_seed)
np.random.seed(global_seed)
random.seed(global_seed)

feature_names = ['f1', 'f2', 'f3_static', 'f4', 'f5', 'f6', 'f7',
 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 
'f18', 'f19', 'f20']

# Code from tutorial on probatus documentation
X, y = make_classification(n_samples=100, class_sep=0.05, n_informative=6, n_features=20, 
random_state=0, n_redundant=10, n_clusters_per_class=1)
X = pd.DataFrame(X, columns=feature_names)

def shap_feature_selection(X, y, seed: int) -> list[str]:
    
    random_forest = RandomForestClassifier(random_state=seed, n_estimators=70, max_features='log2',
criterion='entropy', class_weight='balanced')
    # Set to run on one thread only
    shap_elimination = ShapRFECV(clf=random_forest, step=0.2, cv=5,
scoring='f1_macro', n_jobs=1, random_state=seed)

    report = shap_elimination.fit_compute(X, y, check_additivity=True, seed=seed)
    # Return the set of features with the best validation accuracy
    return report.iloc[[report['val_metric_mean'].idxmax() - 1]]['features_set'].to_list()[0]

Results:

# Results from the first run
shap_feature_selection(X, y, 0)

>>> ['f17', 'f15', 'f18', 'f8', 'f12', 'f1', 'f13']

# Running again in same session
shap_feature_selection(X, y, 0)

>>> ['f17', 'f15', 'f18', 'f8', 'f12', 'f1', 'f13']

# Restarting the kernel and running the exact same command
shap_feature_selection(X, y, 0)
>>> ['f8', 'f1', 'f17', 'f6', 'f18', 'f20', 'f12', 'f15', 'f7', 'f13', 'f11']

Details:

Ubuntu 22.04
Python 3.9.12
Numpy 1.22.0
Sklearn 1.1.1

It suggests that probatus or sklearn is using a different rand. Can you comment out calls one by one and see if the problem goes away? — tdelaney, May 19 '23 at 16:41
I know that the randomness happens in the `fit_compute` step, which is from the probatus library. But if it uses a different rand, wouldn't it also produce different results in the same (e.g. Jupyter) session? — Dreana, May 19 '23 at 17:00
Good question! I don't know enough about jupyter to say. Perhaps a C library is loaded and seeded on first use. — tdelaney, May 19 '23 at 17:18
Btw it's not a Jupyter-specific thing - this also happens when running the code twice from the terminal or from my IDE — Dreana, May 19 '23 at 17:50
I tried to find out what `sklearn` uses as a default generator, and instead found out that [it is now deprecated](https://pypi.org/project/sklearn/). — pjs, May 19 '23 at 20:11
@JoelCrypto it does not; the question is why when restarting the kernel (thus re-initaliizing all random number generators at the same states) still produces different results - plus you are quoting from the explanation of setting the random state to `None`, which is certainly and clearly not the case here. It would seem you don't have a clear understanding of the situation - I would kindly suggest you delete the comment now (to reduce clutter). — desertnaut, May 19 '23 at 22:56
Since you've already narrowed it down to `probatus`, that should be in the question text. You might try setting `cv` to a specific splitter with random state set, instead of an integer (despite `probatus` [saying that should work](https://ing-bank.github.io/probatus/howto/reproducibility.html#static-data-splits); I don't think it does in sklearn...). — Ben Reiniger, May 20 '23 at 14:35
I would suggest setting the seed *before* even importing any of the modules that rely on randomness - they might have created a default RNG instance at import time. — jasonharper, May 21 '23 at 20:32
@BenReiniger This actually helped! The feature selector still doesn't return the features in the same order somehow, but I can work with that. MANY thanks. — Dreana, May 21 '23 at 21:15
I just got a chance to try it, and I still get different results with my suggestion for `cv`. — Ben Reiniger, May 22 '23 at 15:23

score 1 · Accepted Answer · edited May 31 '23 at 13:08

1

This has now been fixed in probatus (the issue was a bug, apparently connected to the pandas implementation they were using, see here). For me, everything works as expecting when using the probatus' latest code version (not the package).

edited May 31 '23 at 13:08

desertnaut

57,590
26
140
166

answered May 31 '23 at 12:30

Dreana

573
4
16

Results not reproducible between runs despite seeds being set

1 Answers1