2

I have the following pd.DataFrame

In [155]: df1
Out[155]: 
   ORDER_ID    ACQ       DATE UID
2         3  False 2014-01-03   1
3         4   True 2014-01-04   2
4         5  False 2014-01-05   3
6         7   True 2014-01-08   5
7         8  False 2014-01-08   5
9        10  False 2014-01-10   6
0        11  False 2014-01-11   6

where each entry is an order, with values for ORDER_ID, DATE, UID and ACQ (indicates whether this is the first order for the associated UID in the dataset).

I am trying to filter and keep all orders that were placed by users that have made their first order inside the time period covered in the dataset (i.e. at least one of the orders of such users satisfy ACQ == True).

So, the desired output would be:

   ORDER_ID    ACQ       DATE UID
3         4   True 2014-01-04   2
6         7   True 2014-01-08   5
7         8  False 2014-01-08   5

and I have managed to reach this by:

In [156]: df1.groupby('UID').filter(lambda x: x.ACQ.any() == True)
Out[156]: 
   ORDER_ID    ACQ       DATE UID
3         4   True 2014-01-04   2
6         7   True 2014-01-08   5
7         8  False 2014-01-08   5

However, when I try to find all the orders that were placed by users that have made their first order outside the time period covered in the dataset (i.e. All their orders should satisfy ACQ == False) I seem to be lost. I have tried this:

In [159]: df1.groupby('UID').filter(lambda x: x.ACQ.all() == False)
Out[159]: 
   ORDER_ID    ACQ       DATE UID
2         3  False 2014-01-03   1
4         5  False 2014-01-05   3
6         7   True 2014-01-08   5 ## <- This order is an acquisition, therefore all orders with UID == 5 should be filtered out.
7         8  False 2014-01-08   5
9        10  False 2014-01-10   6
0        11  False 2014-01-11   6

How should I go about filtering out all the orders placed by users that have ALL their orders satisfy ACQ == False?

Any ideas are very much appreciated, thanks!

Thanos
  • 2,472
  • 1
  • 16
  • 33

1 Answers1

3

You need first use condition and then add all:

print (df1.groupby('UID').filter(lambda x: (x.ACQ == False).all()))
   ORDER_ID    ACQ        DATE  UID
2         3  False  2014-01-03    1
4         5  False  2014-01-05    3
9        10  False  2014-01-10    6
0        11  False  2014-01-11    6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252