I have two data frames with different resolution MultiIndex values. The first data frame tracks state
, year
, and hour
variables, while the second data frame only tracks state
and hour
variables. How can I join the two data frames such that the second data frame is repeated for each year? Here is a small reprex:
#!/usr/bin/env python
import pandas as pd
import numpy as np
YEARS = np.arange(2015, 2055, 5)
HOURS = np.arange(1, 8761, 1)
idx0 = pd.MultiIndex.from_product([['NY'], YEARS, HOURS], names=['state', 'year', 'hour'])
df0 = pd.DataFrame(
np.random.rand(70080, 1),
columns=['TOTALLOAD'],
index=idx0
)
# TOTALLOAD
# state year hour
# NY 2015 1 0.283680
# 2 0.475532
# 3 0.309519
# 4 0.360841
# 5 0.835909
# ... ...
# 2050 8756 0.152357
# 8757 0.431416
# 8758 0.694271
# 8759 0.766727
# 8760 0.365636
# [70080 rows x 1 columns]
idx1 = pd.MultiIndex.from_product([['NY'], HOURS], names=['state', 'hour'])
df1 = pd.DataFrame(
np.random.rand(8760, 2),
columns=['WIND', 'SOLAR'],
index=idx1
)
# WIND SOLAR
# state hour
# NY 1 0.835902 0.488313
# 2 0.066278 0.137335
# 3 0.600126 0.526955
# 4 0.661116 0.464443
# 5 0.547492 0.840770
# ... ... ...
# 8756 0.786757 0.829725
# 8757 0.753301 0.985739
# 8758 0.856252 0.196507
# 8759 0.322048 0.650752
# 8760 0.798451 0.050394
# [8760 rows x 2 columns]
# Expected Dataframe
df2 = pd.DataFrame(
np.random.rand(70080, 3),
columns=['TOTALLOAD', 'WIND', 'SOLAR'],
index=idx0
)
# TOTALLOAD WIND SOLAR
# state year hour
# NY 2015 1 0.108648 0.767488 0.391436
# 2 0.012284 0.616722 0.314657
# 3 0.954200 0.666713 0.112874
# 4 0.078314 0.170395 0.697167
# 5 0.893799 0.356095 0.706867
# ... ... ... ...
# 2050 8756 0.233540 0.183593 0.178483
# 8757 0.473917 0.662912 0.135068
# 8758 0.124253 0.784875 0.528687
# 8759 0.350463 0.107292 0.446881
# 8760 0.223953 0.689475 0.425816
#
# [70080 rows x 3 columns]