Let's say I have a pandas
dataframe and apply sklearn.model_selection.train_test_split
with the random_seed
parameter set to 1.
Let's say I then take the exact same pandas
dataframe and create a Spark Dataframe with an instance of SQLContext
. If I apply the PySpark randomSplit
function with the seed
parameter set to 1, will I always be guaranteed to obtain the same exact split?