i need to know why argument random_state in cross_validation.train_test_split is integer not Boolean, since it's role is to flag random allocation or not?
2 Answers
random_state
is not only a flag of randomness or not, but which random seed to use. If you choose random_state = 3
you will "randomly" split the dataset, but you are able to reproduce the same split each time. I.e. each call with the same dataset will yield the same split, which is not the case if you don't specify the random_state
parameter.
The reason why I use the quotation marks, is that it is actually pseudo random. Wikipedia explains pseudorandomness like this:
A pseudorandom process is a process that appears to be random but is not. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process. Such a process is easier to produce than a genuinely random one, and has the benefit that it can be used again and again to produce exactly the same numbers - useful for testing and fixing software.

- 66
- 8
To expand a bit further on Kelvin's answer, if you want a random train-test split, then don't specify the random_state
parameter. If you do not want a random train-test split (i.e. you want an identically-reproducible split each time), specify random_state
with an integer of your choice.

- 12,086
- 10
- 64
- 109