I'll assume that you are starting with a one-row Polars DataFrame. If not, you can easily adapt the algorithm below by starting with the initial timestamp (rather than retrieving it from a one-row DataFrame).
From a performance standpoint, it's best to construct a new DataFrame, rather than extending/growing/appending to the one-line DataFrame.
In addition, rather than using slow Python functions, I suggest using two performant Polars functions, polars.arange
and polars.repeat
, to construct the timestamp
and Counter
Series for the new DataFrame.
Starting with this data:
import polars as pl
one_row_df = pl.DataFrame(
{
"timestamp": [164323232],
"Counter": [2],
}
)
one_row_df
shape: (1, 2)
┌───────────┬─────────┐
│ timestamp ┆ Counter │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════════╪═════════╡
│ 164323232 ┆ 2 │
└───────────┴─────────┘
I'll express the algorithm as a function.
def create_dummy_df(orig_df: pl.DataFrame, nbr_rows: int) -> pl.DataFrame:
initial_ts = orig_df.get_column("timestamp")[0]
nbr_secs_in_day = 24 * 60 * 60
result = pl.DataFrame(
{
"timestamp": pl.arange(
low=initial_ts,
high=initial_ts + (nbr_rows * nbr_secs_in_day),
step=nbr_secs_in_day,
eager=True,
),
"Counter": pl.repeat(0, n=nbr_rows, eager=True),
}
)
return result
create_dummy_df(one_row_df, 100)
>>> create_dummy_df(one_row_df, 100)
shape: (100, 2)
┌───────────┬─────────┐
│ timestamp ┆ Counter │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════════╪═════════╡
│ 164323232 ┆ 0 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 164409632 ┆ 0 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 164496032 ┆ 0 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 164582432 ┆ 0 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172617632 ┆ 0 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172704032 ┆ 0 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172790432 ┆ 0 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 172876832 ┆ 0 │
└───────────┴─────────┘