1

I have about two years of monthly gas usage for a city and want to generate daily use concerning daily usage sum equal to monthly and keep time-series shape, but I don't know how to do that.

Here is my data Link [1]

  • To clarify, you want to interpolate the daily usage from monthly usage? Additionally, please re-post the source data as *text* not an image. An image is very difficult to copy and paste. – S3DEV Apr 20 '22 at 07:37
  • 1
    Yes, I want to interpolate the daily usage from monthly usage. Here is the usage data link: shorturl.at/syDJ0 – Milad Firoozeh Apr 20 '22 at 08:00
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 21 '22 at 01:12
  • 1
    Great, thank you for the confirmation and the dataset. Please see the answer below. – S3DEV Apr 21 '22 at 10:54

1 Answers1

2

The following code sample demonstrates date and data interpolation using pandas.

The following steps are taken:

  • Using the provided dataset, read this into a DataFrame.
  • Calculate a cumulative sum of usage data.
  • Set the DataFrame's index as the date, to facilitate date resampling.
  • Resample for dates to a daily frequency.
  • Calculate the daily usage.

Example code:

# Read the CSV and convert dates to a datetime object.
path = '~/Downloads/usage.csv'
df = pd.read_csv(path, 
                 header=0, 
                 names=['date', 'gas_usage'], 
                 converters={'date': pd.to_datetime})
# Calculate a cumulative sum to be interpolated.
df['gas_usage_c'] = df['gas_usage'].cumsum()
# Move the date to the index, for resampling.
df.set_index('date', inplace=True)

# Resample the data to a daily ('D') frequency.
df2 = df.resample('D').interpolate('time')
# Calculate the daily usage.
df2['daily_usage'] = df2['gas_usage_c'].diff()

Sample output of df2:

               gas_usage   gas_usage_c   daily_usage
date                                                
2016-03-20  3.989903e+07  3.989903e+07           NaN
2016-03-21  3.932781e+07  4.061487e+07  7.158445e+05
2016-03-22  3.875659e+07  4.133072e+07  7.158445e+05
                 ...           ...           ...
2018-02-18  4.899380e+07  7.967041e+08  1.598856e+06
2018-02-19  4.847973e+07  7.983029e+08  1.598856e+06
2018-02-20  4.796567e+07  7.999018e+08  1.598856e+06

[703 rows x 3 columns]

Visual confirmation

I've included two simple graphs to illustrate the dataset alignment and interpolation.

enter image description here

enter image description here

Plotting code:

For completeness, the rough plotting code is included below.

from plotly.offline import plot

plot({'data': [{'x': df.index, 
                'y': df['gas_usage'], 
                'type': 'bar'}], 
      'layout': {'title': 'Original',
                 'template': 'plotly_dark'}})
plot({'data': [{'x': df2.index, 
                'y': df2['daily_usage'], 
                'type': 'bar'}], 
      'layout': {'title': 'Interpolated',
                 'template': 'plotly_dark'}})
S3DEV
  • 8,768
  • 3
  • 31
  • 42
  • Thanks for the code. If I want to have more natural usage (most days have the same numbers, I want different usage per day) what should I do? – Milad Firoozeh Apr 21 '22 at 18:20
  • 1
    My pleasure. If you want more ‘natural’ usage, the sample rate of the recording device would need to be increased. If you think it through; for example, over 10 days the usage is 1000, that interpolates to 100 units per day, and *all* days will be assigned an average usage of 100 units. – S3DEV Apr 21 '22 at 20:19
  • I have 6 years of monthly data. How to increase the sample rate? My second question is, the data generated with your code has a Nan cell in the first row; how to fill it? – Milad Firoozeh Apr 22 '22 at 06:08