How to oversample a dataframe in pyspark?
df.sample(fractions, seed)
Which only sample a fraction of the df, it can't oversample.
How to oversample a dataframe in pyspark?
df.sample(fractions, seed)
Which only sample a fraction of the df, it can't oversample.
You could over-sample by making use of the sample method as follows:
df.sample(withReplacement=True, total_percent_of_upsample, seed)
sample(withReplacement, fraction, seed=None)
The True
indicates that you want to sample with replacement.