Regarding the seeding system when running machine learning algorithms with Scikit-Learn
, there are three different things usually mentioned:
random.seed
np.random.seed
random_state
atSkLearn
(cross-validation iterators, ML algorithms etc)
I have already in my mind this FAQ of SkLearn
about how to fix the global seeding system and articles which point out that this should not be simply a FAQ.
My ultimate question is how can I get absolutely reproducible results when running an ML algorithm with SkLearn
?
In more detail,
- If I only use
np.random.seed
and do not specify anyrandom_state
atSkLearn
then will my results be absolutely reproducible?
and one question at least for the sake of knowledge:
- How exactly
np.random.seed
andrandom_state
ofSkLearn
are internally related? Hownp.random.seed
affects the seeding system (random_state
) ofSkLearn
and makes it (at least hypothetically speaking) to reproduce the same results?