0

I'm learning polars (as substitute of pandas) and I would reply some excel functions.

In particular average if over a rolling windows.

Let us suppose we have a column with positive and negative value, how can I create a new column with rolling average only if all the value in the column are positive?

import polars as pl

df = pl.DataFrame(
    {
        "Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
        "Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
    }
)

df = df.with_columns([(
    pl.col("Close").pct_change().alias("Close Returns")
)])

This creates a data frame with the column "Close Returns" and the new column will be it's average on a fixed windows only if the are all positive.

And if I want to create a new column as result of quotient positive average over negative?

As example for a window of two elements, in the image below there is the first which is null and do nothing. First widows contains a positive and a negative so returns zero (I need 2 positive value) while last window contains two negative and the mean can be computed.

enter image description here

Here my solution but I'm not satisfied:

import polars as pl

dataset = pl.DataFrame(
    {
        "Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
        "Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
    }
)

q = dataset.lazy().with_column(pl.col("Date").str.strptime(pl.Date, fmt="%d/%m/%y"))

df = q.collect()
df = df.with_columns([(
    pl.col("Close").pct_change().alias("Close Returns")
)])

lag_vector = [2, 6, 7, 10, 12, 13]

for lag in lag_vector:
    out = df.groupby_rolling(
        index_column="Date", period=f"{lag}w"
    ).agg([
        pl.col("Close Returns").filter(pl.col("Close Returns") >= 0).mean().alias("positive mean"),
        pl.col("Close Returns").filter(pl.col("Close Returns") < 0).mean().alias("negative mean"),
    ])
    out["negative mean"] = out["negative mean"].fill_null("zero")
    out["positive mean"] = out["positive mean"].fill_null("zero")
    out = out.with_columns([
        (pl.col("positive mean") / (pl.col("positive mean") - pl.col("negative mean"))).alias(f"{lag} lag mean"),
    ])
    df = df.join(out.select(["Date", f"{lag} lag mean"]), left_on="Date", right_on="Date")
Sigi
  • 53
  • 8
  • It might be good to add data that has both negative and positive values, as that is what you are asking a solution for. It might also help to add the expected outcome so that question is easier to understand. – ritchie46 Mar 31 '22 at 09:57
  • Sorry, it was a wrong copy/paste. I fixed and add an image with the result df – Sigi Mar 31 '22 at 10:01

2 Answers2

1

Edit: I've tweaked my answer to use the any expression so that the non-negative windowed mean is calculated if any (rather than all) of the values in the window is non-negative. Likewise, for the negative windowed mean.

lag_vector = [1, 2, 3]
for lag in lag_vector:
    out = (
        df
        .groupby_rolling(index_column="Date", period=f"{lag}w").agg(
            [
                pl.col('Close Returns').alias('Close Returns list'),
                pl.when((pl.col("Close Returns") >= 0).any())
                .then(pl.col('Close Returns').filter(pl.col("Close Returns") >= 0).mean())
                .otherwise(0)
                .alias("positive mean"),
                pl.when((pl.col("Close Returns") < 0).any())
                .then(pl.col('Close Returns').filter(pl.col("Close Returns") < 0).mean())
                .otherwise(0)
                .alias("negative mean"),
            ]
        )
    )

    print(out)

Window size 1 week:

shape: (9, 4)
┌────────────┬────────────────────┬───────────────┬───────────────┐
│ Date       ┆ Close Returns list ┆ positive mean ┆ negative mean │
│ ---        ┆ ---                ┆ ---           ┆ ---           │
│ date       ┆ list [f64]         ┆ f64           ┆ f64           │
╞════════════╪════════════════════╪═══════════════╪═══════════════╡
│ 1998-04-12 ┆ [null]             ┆ 0.0           ┆ 0.0           │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-04-19 ┆ [-0.023933]        ┆ 0.0           ┆ -0.023933     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-04-26 ┆ [0.0689]           ┆ 0.0689        ┆ 0.0           │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-03 ┆ [-0.061996]        ┆ 0.0           ┆ -0.061996     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-10 ┆ [-0.043622]        ┆ 0.0           ┆ -0.043622     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-17 ┆ [0.021424]         ┆ 0.021424      ┆ 0.0           │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-24 ┆ [0.028417]         ┆ 0.028417      ┆ 0.0           │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-31 ┆ [-0.008553]        ┆ 0.0           ┆ -0.008553     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-06-07 ┆ [-0.164565]        ┆ 0.0           ┆ -0.164565     │
└────────────┴────────────────────┴───────────────┴───────────────┘

Window size 2 weeks:

shape: (9, 4)
┌────────────┬────────────────────────┬───────────────┬───────────────┐
│ Date       ┆ Close Returns list     ┆ positive mean ┆ negative mean │
│ ---        ┆ ---                    ┆ ---           ┆ ---           │
│ date       ┆ list [f64]             ┆ f64           ┆ f64           │
╞════════════╪════════════════════════╪═══════════════╪═══════════════╡
│ 1998-04-12 ┆ [null]                 ┆ 0.0           ┆ 0.0           │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-04-19 ┆ [null, -0.023933]      ┆ 0.0           ┆ -0.023933     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-04-26 ┆ [-0.023933, 0.0689]    ┆ 0.0689        ┆ -0.023933     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-03 ┆ [0.0689, -0.061996]    ┆ 0.0689        ┆ -0.061996     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-10 ┆ [-0.061996, -0.043622] ┆ 0.0           ┆ -0.052809     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-17 ┆ [-0.043622, 0.021424]  ┆ 0.021424      ┆ -0.043622     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-24 ┆ [0.021424, 0.028417]   ┆ 0.0249        ┆ 0.0           │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-31 ┆ [0.028417, -0.008553]  ┆ 0.028417      ┆ -0.008553     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-06-07 ┆ [-0.008553, -0.164565] ┆ 0.0           ┆ -0.086559     │
└────────────┴────────────────────────┴───────────────┴───────────────┘

Window size 3 weeks:

shape: (9, 4)
┌────────────┬──────────────────────────────────┬───────────────┬───────────────┐
│ Date       ┆ Close Returns list               ┆ positive mean ┆ negative mean │
│ ---        ┆ ---                              ┆ ---           ┆ ---           │
│ date       ┆ list [f64]                       ┆ f64           ┆ f64           │
╞════════════╪══════════════════════════════════╪═══════════════╪═══════════════╡
│ 1998-04-12 ┆ [null]                           ┆ 0.0           ┆ 0.0           │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-04-19 ┆ [null, -0.023933]                ┆ 0.0           ┆ -0.023933     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-04-26 ┆ [null, -0.023933, 0.0689]        ┆ 0.0689        ┆ -0.023933     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-03 ┆ [-0.023933, 0.0689, -0.061996]   ┆ 0.0689        ┆ -0.042965     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-10 ┆ [0.0689, -0.061996, -0.043622]   ┆ 0.0689        ┆ -0.052809     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-17 ┆ [-0.061996, -0.043622, 0.021424] ┆ 0.021424      ┆ -0.052809     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-24 ┆ [-0.043622, 0.021424, 0.028417]  ┆ 0.0249        ┆ -0.043622     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-05-31 ┆ [0.021424, 0.028417, -0.008553]  ┆ 0.0249        ┆ -0.008553     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1998-06-07 ┆ [0.028417, -0.008553, -0.164565] ┆ 0.028417      ┆ -0.086559     │
└────────────┴──────────────────────────────────┴───────────────┴───────────────┘

Is this closer to what you are looking for?

  • Fair enough but I don't need all element in the windows with same sign. I need at least one! If none or null, return zero. Otherwise a mean for positive and another for negative. – Sigi Mar 31 '22 at 21:31
  • I see. Let me revise my answer. It's an easy change. –  Mar 31 '22 at 21:41
  • That's great and better than mine! Thank you! – Sigi Apr 01 '22 at 08:03
0

You can use groupby_rolling and then in the aggregation filter out values that are negative.

In the example below, we parse the dates and then groupby a window of 10 days ("10d"), finally we aggregate by our conditions.

df = pl.DataFrame(
    {
        "Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98",],
        "Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20]
    }
)

(df.with_column(pl.col("Date").str.strptime(pl.Date, fmt="%d/%m/%y"))
   .groupby_rolling(index_column="Date", period="10d")
   .agg([
       pl.col("Close").filter(pl.col("Close") > 0).mean().alias("mean")
   ])
)
shape: (7, 2)
┌────────────┬────────┐
│ Date       ┆ mean   │
│ ---        ┆ ---    │
│ date       ┆ f64    │
╞════════════╪════════╡
│ 1998-04-12 ┆ 15.46  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1998-04-19 ┆ 15.275 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1998-04-26 ┆ 15.61  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1998-05-03 ┆ 15.63  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1998-05-10 ┆ 14.8   │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1998-05-17 ┆ 14.625 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1998-05-24 ┆ 14.99  │
└────────────┴────────┘

ritchie46
  • 10,405
  • 1
  • 24
  • 43
  • Actually, this doesn't help me. I improved original question by adding an example. In particular I need to return zero if the rolling window doesn't contain enough data. If I have a window of 2 and in that window values are all positive, than return zero for that value. – Sigi Mar 31 '22 at 10:43