I have a dataframe that looks something like this:
df = pl.DataFrame({"group" : ["foo", "bar", "baz"],
"elements" : [
pl.arange(0, 100, eager=True),
pl.arange(200, 300, eager=True),
pl.arange(300, 400, eager=True)
],
"weight": [0.1, 0.5, 0.4]})
print(df)
┌───────┬───────────────────┬────────┐
│ group ┆ elements ┆ weight │
│ --- ┆ --- ┆ --- │
│ str ┆ list[i64] ┆ f64 │
╞═══════╪═══════════════════╪════════╡
│ foo ┆ [0, 1, … 99] ┆ 0.1 │
│ bar ┆ [200, 201, … 299] ┆ 0.5 │
│ baz ┆ [300, 301, … 399] ┆ 0.4 │
└───────┴───────────────────┴────────┘
How would I sample e.g. 5 elements from each of the lists in the elements
column, such that my dataframe looks something like this?
┌───────┬───────────────────────┬────────┐
│ group ┆ elements ┆ weight │
│ --- ┆ --- ┆ --- │
│ str ┆ list[i64] ┆ f64 │
╞═══════╪═══════════════════════╪════════╡
│ foo ┆ [7,42,19,74,33] ┆ 0.1 │
│ bar ┆ [209,277,222,291,260] ┆ 0.5 │
│ baz ┆ [300,347,312,398,369] ┆ 0.4 │
└───────┴───────────────────────┴────────┘
If I then wanted to sample a total of 1000 elements
from across all groups
, weighted according to the weight
column, how would I go about doing that?
I've seen this question: Sample from each group in polars dataframe? which I think is probably similar, but so far I haven't been able to come up with the combination of expressions that will work.