I have a pandas
DataFrame
like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'date': pd.to_datetime(['2021-06-06']*3 + ['2021-06-07']*3 + ['2021-06-09']*3),
'cat': ['a', 'b', 'c']*3,
'value': np.random.randn(9)
})
# date cat value
# 0 2021-06-06 a 1.182969
# 1 2021-06-06 b 0.460896
# 2 2021-06-06 c 0.859053
# 3 2021-06-07 a 0.259080
# 4 2021-06-07 b -1.333531
# 5 2021-06-07 c 1.258963
# 6 2021-06-09 a 0.254633
# 7 2021-06-09 b 0.250557
# 8 2021-06-09 c 1.363891
I would like to fill in the date(s) that are missing in the date
column, but for all unique values from the cat
column. For value
column, it is okey to leave there NaN
. The following code seems to achieve what I need, but I was just wondering if there is an easier way to do the same.
from itertools import product
df_out = pd.DataFrame(
product(
pd.date_range(df.date.min(), df.date.max(), freq="D"),
df.cat.unique()
)
)\
.set_axis(["date", "cat"], axis=1)\
.merge(df, how="left", on=["date", "cat"])
# date cat value
# 0 2021-06-06 a 1.090784
# 1 2021-06-06 b -0.479246
# 2 2021-06-06 c 1.886213
# 3 2021-06-07 a 1.477877
# 4 2021-06-07 b -0.026864
# 5 2021-06-07 c -1.432626
# 6 2021-06-08 a NaN
# 7 2021-06-08 b NaN
# 8 2021-06-08 c NaN
# 9 2021-06-09 a 0.055170
# 10 2021-06-09 b -2.060127
# 11 2021-06-09 c 0.402082