Your output is basically just the same data, but with an additional index with the times rounded down to days. If that's what you are trying to achieve, don't use resample
.
You don't need it and you can just reset the index:
In[]:
bad_example_df.set_index([bad_example_df.index.floor('D'), bad_example_df.index])
Out[]:
return stock
2020-01-02 2020-01-02 02:43:59 1.003 AMZN
2020-01-03 2020-01-03 12:39:59 1.020 APPL
2020-01-03 21:42:59 1.060 NVDA
2020-01-04 2020-01-04 02:53:59 1.020 MSFT
2020-01-04 19:17:59 1.030 AMZN
OTOH, your lambda
makes it look like you are trying to get the first two values for each day. If that's the case, I think apply
is not what you want to use (presumably b/c of the way resample().apply()
iterates, see here). Notice that if you change the first date to January first, you get even worse unexpected, output:
In[]:
third_index_list = [datetime(2020,1,1,2,43,59), datetime(2020,1,2,12,39,59),datetime(2020,1,3,21,42,59),
datetime(2020,1,4,2,53,59), datetime(2020,1,4,19,17,59)]
terrible_example_df = pd.DataFrame(data = data_dict, index = third_index_list)
terrible_example_df.resample("D").apply(lambda x: x[:2])
Out[]:
return stock
2020-01-01 1.003 AMZN
2020-01-02 1.02 APPL
2020-01-03 1.06 NVDA
2020-01-04 [1.02, 1.03] [MSFT, AMZN]
#now the dtype is object and lots of operations will fail!
So I think your good_example_df
just happens to give an expected output, and your bad_example_df
just happens to give unexpected but still functional output, but both of these are a probably improper use of resample().apply()
. TBH I don't understand what apply
is doing differently in each example.
Instead, it looks like using groupby
and groupby().apply()
(different from resample apply
!) can give you the output you want consistently (as far as I can tell):
In[]:
bad_example_df.groupby(pd.Grouper(freq='D')).apply(lambda x: x[:2])
Out[]:
return stock
2020-01-02 2020-01-02 02:43:59 1.003 AMZN
2020-01-03 2020-01-03 12:39:59 1.020 APPL
2020-01-03 21:42:59 1.060 NVDA
2020-01-04 2020-01-04 02:53:59 1.020 MSFT
2020-01-04 19:17:59 1.030 AMZN
#works for terrible_example_df as well
And also TBH, here I don't understand why two indexes are created, but it seems to work!