How to create a polars dataframe on the basis of previous row

Question

I'm trying to create a polar data frame in python. Dataframe format is:

timestamp(secs)	Counter
164323232	2

I'm given only the first row. Now I need to create a dummy dataframe (say 100 rows) on the basis of this first row. Each row should be apart by one day and the counter will be zero.

score 0 · Answer 1 · 2022-07-12T20:48:19.933

I'll assume that you are starting with a one-row Polars DataFrame. If not, you can easily adapt the algorithm below by starting with the initial timestamp (rather than retrieving it from a one-row DataFrame).

From a performance standpoint, it's best to construct a new DataFrame, rather than extending/growing/appending to the one-line DataFrame.

In addition, rather than using slow Python functions, I suggest using two performant Polars functions, polars.arange and polars.repeat, to construct the timestamp and Counter Series for the new DataFrame.

Starting with this data:

import polars as pl

one_row_df = pl.DataFrame(
    {
        "timestamp": [164323232],
        "Counter": [2],
    }
)
one_row_df

shape: (1, 2)
┌───────────┬─────────┐
│ timestamp ┆ Counter │
│ ---       ┆ ---     │
│ i64       ┆ i64     │
╞═══════════╪═════════╡
│ 164323232 ┆ 2       │
└───────────┴─────────┘

I'll express the algorithm as a function.

def create_dummy_df(orig_df: pl.DataFrame, nbr_rows: int) -> pl.DataFrame:
    initial_ts = orig_df.get_column("timestamp")[0]
    nbr_secs_in_day = 24 * 60 * 60
    result = pl.DataFrame(
        {
            "timestamp": pl.arange(
                low=initial_ts,
                high=initial_ts + (nbr_rows * nbr_secs_in_day),
                step=nbr_secs_in_day,
                eager=True,
            ),
            "Counter": pl.repeat(0, n=nbr_rows, eager=True),
        }
    )
    return result

create_dummy_df(one_row_df, 100)

>>> create_dummy_df(one_row_df, 100)
shape: (100, 2)
┌───────────┬─────────┐
│ timestamp ┆ Counter │
│ ---       ┆ ---     │
│ i64       ┆ i64     │
╞═══════════╪═════════╡
│ 164323232 ┆ 0       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 164409632 ┆ 0       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 164496032 ┆ 0       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 164582432 ┆ 0       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ ...       ┆ ...     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172617632 ┆ 0       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172704032 ┆ 0       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172790432 ┆ 0       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172876832 ┆ 0       │
└───────────┴─────────┘

How to create a polars dataframe on the basis of previous row

1 Answers1