sklearn's KFold function with shuffle and random_state

Question

I'm trying to understand how to use the cross-validation function sklearn.model_selection.KFold. If I define (like in this tutorial)

from sklearn.model_selection import KFold

kf = KFold(n_splits=5, shuffle=False, random_state=100)

I get

ValueError: Setting a random_state has no effect since shuffle is False.
You should leave random_state to its default (None), or set shuffle=True.

What does this error mean and why is it necessary to set random_state=None or shuffle=True?

score 5 · Answer 1 · edited Jun 29 '21 at 09:41

5

Shuffling in this context means that the data is first randomly shuffled before splitting into test/train. The random_state will allow the way in which the data is shuffled to be repeatable. Without the shuffling switched on, the random_state has no meaning.

edited Jun 29 '21 at 09:41

desertnaut

57,590
26
140
166

answered Jun 28 '21 at 20:52

Taylrl

3,601
6
33
44

Thanks, what references do you recommend for learning more on data preprocessing? I don't find the sklearn docs that helpful. – Medulla Oblongata Jun 28 '21 at 20:57
1

That's the correct answer indeed, although I confess I am puzzled why the sklearn designers decided to throw an error in this case; arguably a warning would be more than enough. – desertnaut Jun 29 '21 at 09:40
Thanks! This worked for me. kfold = KFold(n_splits=10, random_state=10, shuffle=True) – Sanushi Salgado Apr 09 '22 at 11:04

score 3 · Answer 2 · edited Jun 04 '22 at 19:33

3

By default in kfold shuffle=False, by putting random_state to value, you need to activate shuffle, shuffle=True, which will work.

Example:

k_fold = model_selection.KFold(n_splits=10,shuffle=True, random_state=10)

edited Jun 04 '22 at 19:33

RiveN

2,595
11
13
26

answered Jun 02 '22 at 21:12

khalil tekil

31
2

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 03 '22 at 00:53

sklearn's KFold function with shuffle and random_state

2 Answers2