0

I have hourly data on electricity consumption for a specific day. I would like to use this data to "predict" the hourly electricity consumption for the following days. The value for the following day should be the value from the same hour the day before, multiplied by a scaling factor f (e.g. 2).

The dataframe df that I have looks something like this:

                       load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0

And I would like the output dataframe df_ex to look something like this:

                       load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0
2021-01-02 00:00:00   2.0
2021-01-02 01:00:00   2.0
2021-01-02 02:00:00   2.0
2021-01-02 03:00:00   2.0
2021-01-02 04:00:00   2.0
2021-01-02 05:00:00   2.0
2021-01-02 06:00:00   2.0
2021-01-02 07:00:00   6.0
2021-01-02 08:00:00   6.0
2021-01-02 09:00:00   6.0
2021-01-02 10:00:00   6.0
2021-01-02 11:00:00   6.0
2021-01-02 12:00:00   6.0
2021-01-02 13:00:00   6.0
2021-01-02 14:00:00   6.0
2021-01-02 15:00:00   6.0
2021-01-02 16:00:00   6.0
2021-01-02 17:00:00   6.0
2021-01-02 18:00:00   6.0
2021-01-02 19:00:00   6.0
2021-01-02 20:00:00   2.0
2021-01-02 21:00:00   2.0
2021-01-02 22:00:00   2.0
2021-01-02 23:00:00   2.0
2021-01-03 00:00:00   4.0
2021-01-03 01:00:00   4.0
2021-01-03 02:00:00   4.0
2021-01-03 03:00:00   4.0
2021-01-03 04:00:00   4.0
2021-01-03 05:00:00   4.0
2021-01-03 06:00:00   4.0
2021-01-03 07:00:00   12.0
2021-01-03 08:00:00   12.0
2021-01-03 09:00:00   12.0
2021-01-03 10:00:00   12.0
2021-01-03 11:00:00   12.0
2021-01-03 12:00:00   12.0
2021-01-03 13:00:00   12.0
2021-01-03 14:00:00   12.0
2021-01-03 15:00:00   12.0
2021-01-03 16:00:00   4.0
2021-01-03 17:00:00   4.0
2021-01-03 18:00:00   4.0
2021-01-03 19:00:00   4.0
2021-01-03 20:00:00   4.0
2021-01-03 21:00:00   4.0
2021-01-03 22:00:00   4.0
2021-01-03 23:00:00   4.0

I have tried the following solution (df as defined above):

import pandas as pd
import datetime

start = '2021-01-01 00:00'
end = '2021-01-03 23:00'
freq = 'H'

index = pd.date_range(start,
                      end,
                      freq=freq)

df_ex = df.reindex(index)

i = df_ex.index[0].day
f = 2.0
df_ex.loc[df_ex.index.day == i+1] = df_ex.loc[df_ex.index.day == i] * f

print(df_ex)

Which results in:

                      load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
...                   ...
2021-01-03 19:00:00   NaN
2021-01-03 20:00:00   NaN
2021-01-03 21:00:00   NaN
2021-01-03 22:00:00   NaN
2021-01-03 23:00:00   NaN

It appears that my attempt to populate the rows after the first day with values did not succeed. The index is a DateTimeIndex.

Any suggestions on how to solve this would be greatly appreciated!

Malte
  • 3
  • 2

2 Answers2

0

To create the data you need to iterate one day at a time.

Assuming that the original data has at least a full day of data then you can do:

import pandas as pd
import itertools
import datetime as dt

start = "2021-01-01 00:00"
end = "2021-01-01 23:00"
freq = "H"

df = pd.DataFrame(
    {"load_kWh": itertools.chain([1.0] * 7, [3.0] * 13, [1.0] * 4)},
    index=pd.date_range(start, end, freq=freq),
)


def add_days_to_df(data: pd.DataFrame, number_of_days: int, k: float) -> pd.DataFrame:
    data = data.copy()
    for _ in range(number_of_days):
        day = data[-24:]
        day.index += dt.timedelta(days=1)
        day *= k
        data = pd.concat((data, day))
    return data


print(add_days_to_df(data=df, number_of_days=2, k=2.0))
Matteo Zanoni
  • 3,429
  • 9
  • 27
  • Great! Thank you so much. The solution is working for me. Do you happen to have an idea on how to generalize this from days to years? I have input data (hourly) for a full year, and want to extrapolate to multiple years. With leap years, I'm afraid just switching everything from 24 to 8760 and days to years won't work 100 %. – Malte Nov 14 '22 at 15:00
0

I managed to get a partial solution, which works for years instead of days (copy/scale the data from the same day of the previous year). It's only a partial solution because leap years are not taken into account yet.

def add_years_to_df(data: pd.DataFrame, target_year: int, k: float) -> pd.DataFrame:
base_year = data.index[0].year
i = base_year+1
add = data.copy()
for _ in range (base_year+1, target_year+1):
    add = k*add
    add.index = add.index.map(lambda t: t.replace(year=i))
    data = pd.concat((data, add))
    i += 1
return data

lp is the input dataframe as stated in my original question. target_year is the year until which the data is to be extrapolated. k is the multiplier.

To call on the function, enter e.g. add_years_to_df(data=lp, target_year = 2030, k=1.1)

This results in:

Datetime                load_kWh
2021-01-01 00:00:00     77.987500
2021-01-01 01:00:00     78.116667
2021-01-01 02:00:00     79.383333
2021-01-01 03:00:00     79.070833
2021-01-01 04:00:00     78.275000
...     ...
2030-12-31 19:00:00     247.373361
2030-12-31 20:00:00     74.889393
2030-12-31 21:00:00     71.883018
2030-12-31 22:00:00     73.101291
2030-12-31 23:00:00     72.438118
Malte
  • 3
  • 2