2

I have customers duplicates with different status because there is a row for each customer subscription/product. I want to generate a new_status for the customer and for it to be 'canceled', every subscription status must be 'canceled' together.

I used:

df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()

to separate every duplicated in a index to indicate the duplicated values

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled|            | 0
 A       |canceled|            | 1
 B       |active  |            | 0
 B       |canceled|            | 1

Thus, I'd like to use .apply and/or .loc to generate:

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled| canceled   | 0
 A       |canceled| canceled   | 1
 B       |active  |            | 0
 B       |canceled|            | 1
Nick ODell
  • 15,465
  • 3
  • 32
  • 66

2 Answers2

2

Compare column by Series.eq for == and use GroupBy.transform with GroupBy.all for check if all values are Trues per groups, then compare Customer by Series.duplicated with keep=False for return all dupes. Last chain together by bitwise AND (&) and set values by numpy.where:

m1 = df['Status'].eq('canceled').groupby(df['Customer']).transform('all')
m2 = df['Customer'].duplicated(keep=False)

df['new_status'] = np.where(m1 & m2, 'cancelled', '')
print (df)
  Customer    Status new_status  duplicated
0        X  canceled                      0
1        X  canceled                      1
2        X    active                      2
3        Y  canceled                      0
4        A  canceled  cancelled           0
5        A  canceled  cancelled           1
6        B    active                      0
7        B  canceled                      1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

From what I understand, you can try doing:

df['new_status']=(df.groupby('Customer')['Status'].
  transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)

    Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled  cancelled  0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1   

EDIT since expected o/p was changed:

df['new_status']=(df.groupby('Customer')['Status'].
             transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
                         .map({True:'cancelled',False:''}))
print(df)

  Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled             0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1   
anky
  • 74,114
  • 11
  • 41
  • 70
  • Call function in `apply` or `transform` should be slow, if large dataframe. – jezrael Mar 19 '19 at 11:05
  • I think if not necessary. – jezrael Mar 19 '19 at 11:06
  • 1
    1800 cols. I think it's ok to use .apply and thanks I'm verifying the values, but looks correct – Ricardo Fernandes Mar 19 '19 at 11:16
  • I edited the post just to be as clear as possible, but the first o/p is even closer to what i want thank you. I'd like to understand more about how did you do that and when to use .map, .eq and .transform – Ricardo Fernandes Mar 19 '19 at 12:02
  • @RicardoFernandes no problem. So `x.eq('canceled').all()` checks all the grouped items from Status are equal to cancelled or not, so use `all()` returns true. than using map to replace true to 1 and false to blank. best will be try removing some parts of the code and i think you will understand. :) If any prob lemme know. Cheers..!! – anky Mar 19 '19 at 12:05
  • If i just delete the <,False=' ' > in will it just don't change the supposed false values? (cause it might be filled with other new status) – Ricardo Fernandes Mar 19 '19 at 12:15
  • not blank, but the way it already is (i didn't show in the example but is filled with some other "new_status" – Ricardo Fernandes Mar 19 '19 at 12:27
  • @RicardoFernandes edited. check the first part. same concept for second part. – anky Mar 19 '19 at 12:55