Imagine we have the following polars dataframe:
Feature 1 | Feature 2 | Labels |
---|---|---|
100 | 25 | 1 |
150 | 18 | 0 |
200 | 15 | 0 |
230 | 28 | 0 |
120 | 12 | 1 |
130 | 34 | 1 |
150 | 23 | 1 |
180 | 25 | 0 |
Now using polars we want to drop every row with Labels == 0
with 50% probability. An example output would be the following:
Feature 1 | Feature 2 | Labels |
---|---|---|
100 | 25 | 1 |
200 | 15 | 0 |
230 | 28 | 0 |
120 | 12 | 1 |
130 | 34 | 1 |
150 | 23 | 1 |
I think filter and sample might be handy... I have something but it is not working:
df = df.drop(df.filter(pl.col("Labels") == 0).sample(frac=0.5))
How can I make it work?