2

Every customer is duplicated when they have more than one plan. I want to set the status to the customer:

If they have every product with 'canceled_at' filled, the customer status is cancelled, but when it's not every product with the canceled_at filled, but at least one, the status is 'downgrade' because he lost a product.

customer|canceled_at|status
x       |3/27/2018  |
x       |           |
y       |2/2/2018   |
y       |2/2/2018   |
z       |1/1/2018   |
a       |           |      

I already have the canceled status, now i only need the downgrade

df['status']=(df.groupby('customer')['canceled_at'].
  transform(lambda x: x.notna().all()).map({True:'canceled'})).fillna(df.status)
customer|canceled_at|status
x       |3/27/2018  |downgrade
x       |           |downgrade
y       |2/2/2018   |canceled
y       |2/2/2018   |canceled
z       |1/1/2018   |canceled
a       |           |      
Nick ODell
  • 15,465
  • 3
  • 32
  • 66

2 Answers2

1

Here is possible compare column for no missing values and grouping by Series customer with GroupBy.transform and GroupBy.all, GroupBy.any for test all values Trues (all non missing) or at least one value not missing (any non missing) and pass it to numpy.select:

g = df['canceled_at'].notna().groupby(df['customer'])
m1 = g.transform('all')
m2 = g.transform('any')

df['status'] = np.select([m1, m2],['canceled','downgrade'], np.nan)
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN        nan

Or:

df['status'] = np.select([m1, m2],['canceled','downgrade'], '')
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN         

If only NaNs groups need convert to downgrade:

mask = df['canceled_at'].notna().groupby(df['customer']).transform('all')
df['status'] = np.where(mask,'canceled','downgrade')
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN  downgrade  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Here is a way to do it:

import pandas as pd

def select_status(canceled):
    c = canceled.count()
    if c == 0:
        status = ''
    elif c == len(canceled):
        status = 'canceled'
    else:
        status = 'downgrade'
    return pd.Series(status, index=canceled.index)

df = pd.DataFrame({'customer': ['x', 'x', 'y', 'y', 'z', 'a'],
                   'canceled_at': ['3/27/2018', None, '2/2/2018', '2/2/2018', '1/1/2018', None]})
df['status'] = df.groupby('customer')['canceled_at'].apply(select_status)
print(df)

Output:

  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x        None  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a        None
jdehesa
  • 58,456
  • 7
  • 77
  • 121