0

I've managed to solve this problem in two steps.

import polars as pl
text = "a brown fox jumps over a lazy dog's head"
step = 3
df = pl.DataFrame({"a":text.split(" ")})


first =  df.filter(pl.arange(0, pl.count())%step==0)
second = df.filter(pl.arange(0, pl.count())%step==1)
third=   df.filter(pl.arange(0, pl.count())%step==2)

dff = (
    pl.DataFrame({
        'first':first['a'], 
        'second':second['a'], 
        'third':third['a']})
)
print(dff)
shape: (3, 3)
┌───────┬────────┬───────┐
│ first ┆ second ┆ third │
│ ---   ┆ ---    ┆ ---   │
│ str   ┆ str    ┆ str   │
╞═══════╪════════╪═══════╡
│ a     ┆ brown  ┆ fox   │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ jumps ┆ over   ┆ a     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ lazy  ┆ dog's  ┆ head  │
└───────┴────────┴───────┘
# 

I have the impression that this should be easily solved in a single chain of expressions but I haven't managed to do so. Any suggestions?

pedrosaurio
  • 4,708
  • 11
  • 39
  • 53

1 Answers1

1
text = "a brown fox jumps over a lazy dog's head"
step = 3
df = pl.DataFrame({"a":text.split(" ")})


(df.with_column(
    (pl.arange(0, pl.count()) // step).alias("step")
).groupby("step", maintain_order=True)
 .agg([
     pl.col("a").take(i).alias(name) for i, name in enumerate(["first", "second", "third"])
 ]))
shape: (3, 4)
┌──────┬───────┬────────┬───────┐
│ step ┆ first ┆ second ┆ third │
│ ---  ┆ ---   ┆ ---    ┆ ---   │
│ i64  ┆ str   ┆ str    ┆ str   │
╞══════╪═══════╪════════╪═══════╡
│ 0    ┆ a     ┆ brown  ┆ fox   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 1    ┆ jumps ┆ over   ┆ a     │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2    ┆ lazy  ┆ dog's  ┆ head  │
└──────┴───────┴────────┴───────┘
ritchie46
  • 10,405
  • 1
  • 24
  • 43
  • So `take()` exists! I had searched for `slice()` in an attempt of indexing a column but surely non-continuous indexes are by definition not a slice. Thanks! – pedrosaurio Sep 01 '22 at 20:52