4

I have a pandas dataframe like this:

 order_id  buyer_id  phone_no
      611      261  9920570003
      681      261  9321613595
      707      261  9768270700
      707      261  9768270700
      707      261  9768270700
      708      261  9820895896
      710      261  7208615775
      710      261  7208615775
      710      261  7208615775
      711      261  9920986486
      800      234    Null
      801      256    Null
      803      289    Null

I have to replace the buyer_id column as follows:

   order_id   buyer_id   phone_no
      611      261_01  9920570003
      681      261_02  9321613595
      707      261_03  9768270700
      707      261_03  9768270700
      707      261_03  9768270700
      708      261_04  9820895896
      710      261_05  7208615775
      710      261_05  7208615775
      710      261_05  7208615775
      711      261_06  9920986486
      800       234       Null
      801       256       Null
      803       289       Null

So if the phone no is same it should treat it as same buyer else it should add new series to 261. I want only 261 buyer_id to be renamed other rows should be the same. Because I am treating orders coming from phone as 261

I am able to add series in 261 buyer_id with following code:

for i in range((len(phone_orders):
    print '261_%d' %i
    segments_data['buyer_id']

phone_orders contains all the phone orders.

But I didn't get how to replace the buyer_id column with desired output

df['buyer_id'] = '261_' + (df['phone_no'] !=      
df['phone_no'].shift()).cumsum().map("{:02}".format)


  buyer_id    phone_no
  261_01  9920570003
  261_02  9321613595
  261_03  9768270700
  261_03  9768270700
  261_03  9768270700
  261_04  9820895896
  261_05  7208615775
  261_05  7208615775
  261_05  7208615775
  261_06  9920986486
  261_07  9768270700
  261_07  9768270700
  261_07  9768270700
  261_08  9820895896
  261_09  7208615775
  261_09  7208615775
  261_09  7208615775

So 7208615775 phone_no should be 261_05 but it is giving 261_09.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
Neil
  • 7,937
  • 22
  • 87
  • 145

2 Answers2

3

Original questions

Find the unique phone numbers first and create ids:

id_map = {k: v for v, k in enumerate(df.phone_no.unique(), 1)}

Now, go throw all entries, adding them to the corresponding phone numbers:

df.buyer_id = df.apply(lambda x: '{}_{:02d}'.format(x.buyer_id, id_map[x.phone_no]), axis=1)

Result:

   order_id buyer_id    phone_no
0       611   261_01  9920570003
1       681   261_02  9321613595
2       707   261_03  9768270700
3       707   261_03  9768270700
4       707   261_03  9768270700
5       708   261_04  9820895896
6       710   261_05  7208615775
7       710   261_05  7208615775
8       710   261_05  7208615775
9       711   261_06  9920986486

Only for buyer id 261

id_map = {k: v for v, k in enumerate(df[df.buyer_id==261].phone_no.unique(), 1) }

def make_buyer_id(x):
    try:
        return '{}_{:02d}'.format(x.buyer_id, id_map[x.phone_no])
    except KeyError:
        return x.buyer_id

df.buyer_id = df.apply(make_buyer_id, axis=1)

Result:

    order_id buyer_id    phone_no
0        611   261_01  9920570003
1        681   261_02  9321613595
2        707   261_03  9768270700
3        707   261_03  9768270700
4        707   261_03  9768270700
5        708   261_04  9820895896
6        710   261_05  7208615775
7        710   261_05  7208615775
8        710   261_05  7208615775
9        711   261_06  9920986486
10       800      234        Null
11       801      256        Null
12       803      289        Null
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
2

You can use astype for converting column buyer_id to string, shift, cumsum and then map:

df['buyer_id']=df['buyer_id'].astype(str) + '_' + (df['phone_no'] != df['phone_no'].shift())
                                                                                   .cumsum()
                                                                        .map("{:02}".format)
print df
   order_id buyer_id    phone_no
0       611   261_01  9920570003
1       681   261_02  9321613595
2       707   261_03  9768270700
3       707   261_03  9768270700
4       707   261_03  9768270700
5       708   261_04  9820895896
6       710   261_05  7208615775
7       710   261_05  7208615775
8       710   261_05  7208615775
9       711   261_06  9920986486

Explanation:

print (df['phone_no'] != df['phone_no'].shift())
0     True
1     True
2     True
3    False
4    False
5     True
6     True
7    False
8    False
9     True
Name: phone_no, dtype: bool
print (df['phone_no'] != df['phone_no'].shift()).cumsum()
0    1
1    2
2    3
3    3
4    3
5    4
6    5
7    5
8    5
9    6
Name: phone_no, dtype: int32
print (df['phone_no'] != df['phone_no'].shift()).cumsum().map("{:02}".format)
0    01
1    02
2    03
3    03
4    03
5    04
6    05
7    05
8    05
9    06
Name: phone_no, dtype: object

EDIT:

If you have want filter values 261 in column buyer_id, you can filter them with loc:

print df.loc[df['buyer_id'] == 261, 'buyer_id']
0    261
1    261
2    261
3    261
4    261
5    261
6    261
7    261
8    261
9    261
Name: buyer_id, dtype: int64

df.loc[df['buyer_id'] == 261, 'buyer_id'] =  df['buyer_id'].astype(str) + '_' + (df['phone_no'] != df['phone_no'].shift()).cumsum().map("{:02}".format)
print df
    order_id buyer_id    phone_no
0        611   261_01  9920570003
1        681   261_02  9321613595
2        707   261_03  9768270700
3        707   261_03  9768270700
4        707   261_03  9768270700
5        708   261_04  9820895896
6        710   261_05  7208615775
7        710   261_05  7208615775
8        710   261_05  7208615775
9        711   261_06  9920986486
10       800      234        Null
11       801      256        Null
12       803      289        Null
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252