Generate daily time series date from monthly usage with python

Question

I have about two years of monthly gas usage for a city and want to generate daily use concerning daily usage sum equal to monthly and keep time-series shape, but I don't know how to do that.

Here is my data Link [1]

To clarify, you want to interpolate the daily usage from monthly usage? Additionally, please re-post the source data as *text* not an image. An image is very difficult to copy and paste. — S3DEV, Apr 20 '22 at 07:37
Yes, I want to interpolate the daily usage from monthly usage. Here is the usage data link: shorturl.at/syDJ0 — Milad Firoozeh, Apr 20 '22 at 08:00
Please provide enough code so others can better understand or reproduce the problem. — Community, Apr 21 '22 at 01:12
Great, thank you for the confirmation and the dataset. Please see the answer below. — S3DEV, Apr 21 '22 at 10:54

score 2 · Accepted Answer · answered Apr 21 '22 at 10:51

The following code sample demonstrates date and data interpolation using pandas.

The following steps are taken:

Using the provided dataset, read this into a DataFrame.
Calculate a cumulative sum of usage data.
Set the DataFrame's index as the date, to facilitate date resampling.
Resample for dates to a daily frequency.
Calculate the daily usage.

Example code:

# Read the CSV and convert dates to a datetime object.
path = '~/Downloads/usage.csv'
df = pd.read_csv(path, 
                 header=0, 
                 names=['date', 'gas_usage'], 
                 converters={'date': pd.to_datetime})
# Calculate a cumulative sum to be interpolated.
df['gas_usage_c'] = df['gas_usage'].cumsum()
# Move the date to the index, for resampling.
df.set_index('date', inplace=True)

# Resample the data to a daily ('D') frequency.
df2 = df.resample('D').interpolate('time')
# Calculate the daily usage.
df2['daily_usage'] = df2['gas_usage_c'].diff()

Sample output of `df2`:

               gas_usage   gas_usage_c   daily_usage
date                                                
2016-03-20  3.989903e+07  3.989903e+07           NaN
2016-03-21  3.932781e+07  4.061487e+07  7.158445e+05
2016-03-22  3.875659e+07  4.133072e+07  7.158445e+05
                 ...           ...           ...
2018-02-18  4.899380e+07  7.967041e+08  1.598856e+06
2018-02-19  4.847973e+07  7.983029e+08  1.598856e+06
2018-02-20  4.796567e+07  7.999018e+08  1.598856e+06

[703 rows x 3 columns]

Visual confirmation

I've included two simple graphs to illustrate the dataset alignment and interpolation.

Plotting code:

For completeness, the rough plotting code is included below.

from plotly.offline import plot

plot({'data': [{'x': df.index, 
                'y': df['gas_usage'], 
                'type': 'bar'}], 
      'layout': {'title': 'Original',
                 'template': 'plotly_dark'}})
plot({'data': [{'x': df2.index, 
                'y': df2['daily_usage'], 
                'type': 'bar'}], 
      'layout': {'title': 'Interpolated',
                 'template': 'plotly_dark'}})

Thanks for the code. If I want to have more natural usage (most days have the same numbers, I want different usage per day) what should I do? — Milad Firoozeh, Apr 21 '22 at 18:20
My pleasure. If you want more ‘natural’ usage, the sample rate of the recording device would need to be increased. If you think it through; for example, over 10 days the usage is 1000, that interpolates to 100 units per day, and *all* days will be assigned an average usage of 100 units. — S3DEV, Apr 21 '22 at 20:19
I have 6 years of monthly data. How to increase the sample rate? My second question is, the data generated with your code has a Nan cell in the first row; how to fill it? — Milad Firoozeh, Apr 22 '22 at 06:08

Generate daily time series date from monthly usage with python

1 Answers1

Example code:

Sample output of df2:

Visual confirmation

Plotting code:

Sample output of `df2`: