I'm learning polars (as substitute of pandas) and I would reply some excel functions.
In particular average if over a rolling windows.
Let us suppose we have a column with positive and negative value, how can I create a new column with rolling average only if all the value in the column are positive?
import polars as pl
df = pl.DataFrame(
{
"Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
"Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
}
)
df = df.with_columns([(
pl.col("Close").pct_change().alias("Close Returns")
)])
This creates a data frame with the column "Close Returns" and the new column will be it's average on a fixed windows only if the are all positive.
And if I want to create a new column as result of quotient positive average over negative?
As example for a window of two elements, in the image below there is the first which is null and do nothing. First widows contains a positive and a negative so returns zero (I need 2 positive value) while last window contains two negative and the mean can be computed.
Here my solution but I'm not satisfied:
import polars as pl
dataset = pl.DataFrame(
{
"Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
"Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
}
)
q = dataset.lazy().with_column(pl.col("Date").str.strptime(pl.Date, fmt="%d/%m/%y"))
df = q.collect()
df = df.with_columns([(
pl.col("Close").pct_change().alias("Close Returns")
)])
lag_vector = [2, 6, 7, 10, 12, 13]
for lag in lag_vector:
out = df.groupby_rolling(
index_column="Date", period=f"{lag}w"
).agg([
pl.col("Close Returns").filter(pl.col("Close Returns") >= 0).mean().alias("positive mean"),
pl.col("Close Returns").filter(pl.col("Close Returns") < 0).mean().alias("negative mean"),
])
out["negative mean"] = out["negative mean"].fill_null("zero")
out["positive mean"] = out["positive mean"].fill_null("zero")
out = out.with_columns([
(pl.col("positive mean") / (pl.col("positive mean") - pl.col("negative mean"))).alias(f"{lag} lag mean"),
])
df = df.join(out.select(["Date", f"{lag} lag mean"]), left_on="Date", right_on="Date")