A simple row-wise shuffle in Polars with
df = df.sample(frac=1.0)
has a peak memory usage of 2x the size of the dataframe (profiling with mprof).
Is there any fast way to perform a row-wise shuffle in Polars while keeping the memory usage down as much as possible? Shuffling column by column (or a batch of columns at a time) with the same seed (or .take
with random index) does the trick but is quite slow.