2

I want to cut a certain part of a string (applied to multiple columns and differs on each column) when one column contains a particular substring

Example: Assume the following dataframe

import pandas as pd
df = pd.DataFrame({'name':['Allan2','Mike39','Brenda4','Holy5'], 'Age': [30,20,25,18],'Zodiac':['Aries','Leo','Virgo','Libra'],'Grade':['A','AB','B','AA'],'City':['Aura','Somerville','Hendersonville','Gannon'], 'pahun':['a_b_c','c_d_e','f_g','h_i_j']})
print(df)

Out:

    name    Age     Zodiac  Grade   City            pahun
0   Allan2  30      Aries   A       Aura            a_b_c
1   Mike39  20      Leo     AB      Somerville      c_d_e
2   Brenda4 25      Virgo   B       Hendersonville  f_g
3   Holy5   18      Libra   AA      Gannon          h_i_j

For example if one entry of column City ends with 'e', cut the last three letters of column 'City' and the last two letters of column 'name'.

What I tried so far is something like this:

df['City'] = df['City'].apply(lambda x: df['City'].str[:-3] if df.City.str.endswith('e'))

That doesn't work and I also don't really know how to cut letters on other columns while having the same if clause.

I'm happy for any help I get. Thank you

davvpo
  • 39
  • 3

2 Answers2

3

You can record the rows with City ending with e then use loc update:

mask = df['City'].str[-1] == 'e'

df.loc[mask, 'City'] = df.loc[mask, 'City'].str[:-3]
df.loc[mask, 'name'] = df.loc[mask, 'name'].str[:-2]

Output:

     name  Age Zodiac Grade         City  pahun
0  Allan2   30  Aries     A         Aura  a_b_c
1    Mike   20    Leo    AB      Somervi  c_d_e
2   Brend   25  Virgo     B  Hendersonvi    f_g
3   Holy5   18  Libra    AA       Gannon  h_i_j
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0
import pandas as pd
df = pd.DataFrame({'name':['Allan2','Mike39','Brenda4','Holy5'], 'Age': [30,20,25,18],'Zodiac':['Aries','Leo','Virgo','Libra'],'Grade':['A','AB','B','AA'],'City':['Aura','Somerville','Hendersonville','Gannon'], 'pahun':['a_b_c','c_d_e','f_g','h_i_j']})

def func(row):
    index = row.name
    if row['City'][-1] == 'c': #check the last letter of column City for each row, implement your condition here.
        df.at[index, 'City'] = df['City'][index][:-3]
        df.at[index, 'name'] = df['name'][index][:-1]

df.apply(lambda x: func(x), axis =1 )
print (df)
Pouya Esmaeili
  • 1,265
  • 4
  • 11
  • 25