I have a DataFrame with a DatetimeIndex of values that are spaced 7 days apart, going back to 1971.
In [174]: df_test
Out[174]:
ID VALUE
DATE
1971-04-02 MORTGAGE30US 7.33
1971-04-09 MORTGAGE30US 7.31
1971-04-16 MORTGAGE30US 7.31
1971-04-23 MORTGAGE30US 7.31
1971-04-30 MORTGAGE30US 7.29
... ... ...
2023-02-09 MORTGAGE30US 6.12
2023-02-16 MORTGAGE30US 6.32
2023-02-23 MORTGAGE30US 6.50
2023-03-02 MORTGAGE30US 6.65
2023-03-09 MORTGAGE30US 6.73
[2711 rows x 2 columns]
In [175]: df_test.index
Out[175]:
DatetimeIndex(['1971-04-02', '1971-04-09', '1971-04-16', '1971-04-23',
'1971-04-30', '1971-05-07', '1971-05-14', '1971-05-21',
'1971-05-28', '1971-06-04',
...
'2023-01-05', '2023-01-12', '2023-01-19', '2023-01-26',
'2023-02-02', '2023-02-09', '2023-02-16', '2023-02-23',
'2023-03-02', '2023-03-09'],
dtype='datetime64[ns]', name='DATE', length=2711, freq=None)
Pandas does not recognize a frequency, as indicated by freq=None
, so I will try to infer one...
In [176]: pd.infer_freq(df_test.index)
But none can be inferred.
There's a lot of data points here, so let's just be sure that this really is evenly-spaced data with no missing values...
In [177]: df_test.reset_index(inplace=True)
In [178]: df_test['DATE'] - df_test['DATE'].shift()
Out[178]:
0 NaT
1 7 days
2 7 days
3 7 days
4 7 days
...
2706 7 days
2707 7 days
2708 7 days
2709 7 days
2710 7 days
Name: DATE, Length: 2711, dtype: timedelta64[ns]
In [179]: (df_test['DATE'] - df_test['DATE'].shift()).all()
Out[179]: True
So I can confirm that I have a regular, repeating frequency of datetime values that are spaced 7 days apart. Why doesn't Pandas recognize this frequency then?
Ultimately what I'm trying to do is resample so that it does have a frequency. I can manually specify the frequency
df_test = df_test.resample('W').max()
and that will work, but I won't always be working with weekly data (sometimes it's daily, sometimes it's monthly or yearly, etc.), so I would like a generic method. My solution was to infer the frequency and use the result to resample, as
df_test = df_test.resample(pd.infer_freq(df_test)).max()
But that doesn't work, since Pandas cannot infer a frequency for this weekly data.
== EDIT ==
Upon further inspection, it appears that the day of week is not the same for the duration of the series. In fact, there's many different DOWs throughout, which would of course confuse pd.infer_freq()
.
So I understand now why that method would return None
, but I would like to find a solution that could better identify this weekly frequency, if any.