pandas, slice multi-index df with multiple conditions

Question

This question is a continuation of pandas re-indexing with missing dates

I want to compute the sum of the values for the most recent 3 months (2015-12, 2015-11, 2015-10). If a stock doesn't have sufficient data i.e. has none,1 or 2 of the 3 months then I want that the value of that sum to be NaN.

I can slice and perform a group by and sum but this doesn't give me what I want since it may have excluded stocks that didn't have any data in this three month period and then does not account for stocks that have 1 or 2 months.

I imagine I need a multi loc statement but I've tinkered around and have not been able to get the results I want.

df2.loc[idx[:,datetime.date(2015,10,1):datetime.date(2015,12,1)],:].groupby(level=0).sum()

Maybe post a chunk of the dataframe so that we can get an idea of how it is structured. — pbreach, Mar 25 '16 at 04:33

score 0 · Answer 1 · answered Mar 25 '16 at 19:43

try this:

In [142]: df
Out[142]:
    value       date stock
0       4 2015-01-01  amzn
1       2 2015-02-01  amzn
2       5 2015-03-01  amzn
3       6 2015-04-01  amzn
4       7 2015-05-01  amzn
5       8 2015-06-01  amzn
6       6 2015-07-01  amzn
7       5 2015-08-01  amzn
8       4 2015-09-01  amzn
9       1 2015-10-01  amzn
10      2 2015-11-01  amzn
11      4 2015-12-01  amzn
12      7 2015-12-02  amzn

In [143]: df[(df['date'] >= pd.to_datetime('2015-10-01'))].groupby(df['date'].dt.month).sum()
Out[143]:
      value
date
10        1
11        2
12       11

Note: I've intentionally added one row to your DF in order to have at least one month with more than one row

In [141]: df.loc[12] = [7, pd.to_datetime('2015-12-02'), 'amzn']

Thank you for your help @MaxU. How do I implement the condition where if a stock has 1 or 2 months missing that their sum be set to NaN. — codingknob, Mar 28 '16 at 22:04

score 0 · Accepted Answer · answered Mar 29 '16 at 01:47

Update:

Using the following data frame:

df = pd.DataFrame({
'value' : [4,2,5,6,7,8,6,5,4,1,2,4],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 12, 1)),
'stock': ['amzn']*12
},columns=[
'value', 'date', 'stock'] )

df2 = pd.DataFrame({
'value' : [1]*11,
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 11, 1)),
'stock': ['msft']*11
},columns=[
'value', 'date', 'stock'] )

df = df.append(df2)

df.set_index(['stock', 'date'], inplace=True)

I did the following:

In [1]: idx = pd.IndexSlice
In [2]: criterion = df.loc[idx[:,'2015-10-01':'2015-12-01'],:].\
groupby(level=0).agg(['count']) > 2
In [3]: criterion = criterion['value']['count']
In [4]: df2 = df.loc[idx[:,'2015-10-01':'2015-12-01'],:].groupby(level=0).sum()
In [5]: df3 = pd.DataFrame(columns=['value'], index=criterion[criterion==False].index)
In [6]: df2[criterion].append(df3, ignore_index=False)

Out[6]:
value
stock   
amzn    7
msft    NaN

In this example MSFT did not have data for 2015-12 (it had only 2 of the 3 months) so its value was set to NaN as per my requirement.

pandas, slice multi-index df with multiple conditions

2 Answers2