Background
I have a large dataframe consisting of 100's of thousands of values. The head of dataframe looks as below
df = pd.DataFrame([np.nan, 1100, 1400, np.nan, 14000],
index=pd.to_datetime(["2011-05-25 10:00:00",
"2011-05-25 16:40:00",
"2011-05-25 17:06:00",
"2011-05-25 17:10:00",
"2011-05-25 17:24:00"])
0
2011-05-25 10:00:00 NaN
2011-05-25 16:40:00 1100.0
2011-05-25 17:06:00 1400.0
2011-05-25 17:10:00 NaN
2011-05-25 17:24:00 14000.0
what I want
The values are not always recorded with 6 minute time step. I want to shift the values which are not recorded at 6 minute time step to the nearest 6 minute step. I want the new dataframe to look like below
n_df = pd.DataFrame([np.nan, 1100, 1400, np.nan, 14000],
index=pd.to_datetime(["2011-05-25 10:00:00",
"2011-05-25 16:42:00",
"2011-05-25 17:06:00",
"2011-05-25 17:12:00",
"2011-05-25 17:24:00"])
)
0
2011-05-25 10:00:00 NaN
2011-05-25 16:42:00 1100.0
2011-05-25 17:06:00 1400.0
2011-05-25 17:12:00 NaN
2011-05-25 17:24:00 14000.0
What is important for me that all the values in n_df should be at 6 minute time step and thus the attribute n_df.index.freq
must not be None
.
How can I achieve this.
So far I am doing it using for
loop by iterating over df
and finding the nearest 6 minute step and shifting/copying the value to that step, but this is very slow if df
is larger than 1000.
what I have tried
index = pd.date_range(df.index[0], end=df.index[-1], freq="6min")
pydatetime_index = index.to_pydatetime()
n_df = pd.DataFrame(columns=df.columns, index=index)
for _idx, i in enumerate(df.index):
nearest_neighbor = np.abs(pydatetime_index - i.to_pydatetime())
idx = np.argmin(nearest_neighbor)
val = df.loc[i]
n_df.iloc[idx] = val