Looking to return a dataframe which contains the last row (the row with most recent date index) of each group, where the second level of the multi-index is filtered by a logical indexing condition.
Here is a toy example included to explain better:
import numpy as np
import pandas as pd
from datetime import datetime
dates = pd.date_range(start='1/1/2018', end='1/4/2018').to_pydatetime().tolist() * 2
ids = ['z7321', 'z7321', 'z7321', 'z7321', 'b2134', 'b2134', 'b2134', 'b2134']
arrays = [ids, dates]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['key', 'date'])
df = pd.DataFrame(data=np.random.randn(len(index)), index=index, columns=['change'])
print(df)
change
key date
z7321 2018-01-01 -0.701605
2018-01-02 -0.934580
2018-01-03 0.186554
2018-01-04 0.417024
b2134 2018-01-01 0.682699
2018-01-02 -0.913633
2018-01-03 0.330347
2018-01-04 -0.706429
The condition would be to return the last row where df[df.index.get_level_values(1) <= datetime(2018, 1, 2)]
The desired output looks like this:
change
key date
z7321 2018-01-02 -0.934580
b2134 2018-01-02 -0.913633
Additional Considerations:
- Directly selecting the rows using
df[df.index.get_level_values(1) == datetime(2018, 1, 2)]
isn't an option since the second index level (date level) may not contain an exact date match for the specified value ofdatetime(2018, 1, 2)
- The date index may not contain the same values across the key groups/index. i.e. 'z7321' could have different dates in the second level index than 'b2134'