how to calculate pct_change by polars?

Question

Now I have a dataframe like this:

df = pd.DataFrame({"asset":["a","b","c","a","b","c","b","c"],"v":[1,2,3,4,5,6,7,8],"date":["2017","2011","2012","2013","2014","2015","2016","2010"]})

I can calculate the pct_change by groupby and my function like this:

def fun(df):
    df = df.sort_values(by="date")
    df["pct_change"] = df["v"].pct_change()
    return df
df = df.groupby("asset",as_index=False).apply(fun)

Now I want to know how can I get the same result by polars?

ritchie46 · Answer 1 · 2022-07-18T11:24:22.337

Here are two options. One using window functions, and one using groupby + explode.

You should benchmark and see which is faster on your use case.

preparing data

df = pl.DataFrame({
    "asset":["a","b","c","a","b","c","b","c"],
    "v":[1,2,3,4,5,6,7,8],
    "date":["2017","2011","2012","2013","2014","2015","2016","2010"]
})

using window functions


(  
  df.sort(["asset", "date"])
    .with_columns([
       pl.col("v").pct_change().over("asset").alias("pct_change")
    ])
)

using groupby + explode

(df.groupby("asset")
   .agg([
       pl.all().first(),
       pl.col("v").sort_by("date").pct_change().alias("pct_change")
   ]).explode("pct_change")
)

Result

Both output:

shape: (8, 4)
┌───────┬─────┬──────┬────────────┐
│ asset ┆ v   ┆ date ┆ pct_change │
│ ---   ┆ --- ┆ ---  ┆ ---        │
│ str   ┆ i64 ┆ str  ┆ f64        │
╞═══════╪═════╪══════╪════════════╡
│ a     ┆ 4   ┆ 2013 ┆ null       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a     ┆ 1   ┆ 2017 ┆ -0.75      │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b     ┆ 2   ┆ 2011 ┆ null       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b     ┆ 5   ┆ 2014 ┆ 1.5        │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b     ┆ 7   ┆ 2016 ┆ 0.4        │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ c     ┆ 8   ┆ 2010 ┆ null       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ c     ┆ 3   ┆ 2012 ┆ -0.625     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ c     ┆ 6   ┆ 2015 ┆ 1.0        │
└───────┴─────┴──────┴────────────┘

I use your code "groupby + explode",I find the column "date" is not show as you. it's just "2011","2012","2017" etc — user3105812, Jul 19 '22 at 06:13
you can run the code "groupby + explode" ,it's not same as " window functions" — user3105812, Jul 19 '22 at 08:16
The output is correct per group. If you want to maintain the order of the groups, you can add `maintain_order=True` to the `groupby`, but this will be slower. — ritchie46, Jul 19 '22 at 08:34
For `groupby+explode` , `(df1.sort(['asset','date']).groupby("asset",maintain_order=True) .agg([ pl.col('v').list().alias('v'), pl.col('date').list().alias('date'), pl.col("v").pct_change().alias("pct_change") ]).explode(['v',"date","pct_change"]) )` — Aung, Jan 13 '23 at 15:20

how to calculate pct_change by polars?

1 Answers1

preparing data

using window functions

using groupby + explode

Result