2

I have a list of strings

x=['llc', 'corp', 'sa'] 

I need to filter at the end of a column in my dataframe containing strings:

df = pd.DataFrame(['Geeks corp', 'toto', 'tete coope', 'tete sa', 'tata corp', 'titi', 'tmtm'] , columns =['Names']) 

as output I would like to. have:

list = ['Geeks', 'toto', 'tete coope', 'tete', 'tata', 'titi', 'tmtm']

What are your suggestions?

APhillips
  • 1,175
  • 9
  • 17

2 Answers2

2

Use Series.str.replace with regex pattern - added $ for match end of string, added \s+ for match space before and joined | for regex or:

pat = '|'.join(f'\s+{y}$' for y in x)
df['Names'] = df['Names'].str.replace(pat, '')
print (df)
        Names
0       Geeks
1        toto
2  tete coope
3        tete
4        tata
5        titi
6        tmtm
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Just a second question, if in my list of string I have more than one word, for example x=['llc sa', 'corp cor', 'sa se'], and I would like to look those words in the whole string, not only at the end, how can I do? – Simon Benavides Jan 21 '20 at 16:03
  • 1
    @Hector Simon Benavides then use `pat = '|'.join(r"\b{}\b".format(y) for y in x) df['Names'] = df['Names'].str.replace('('+ pat + ')', '').str.replace(' +', ' ')`, not tested, on phone only. – jezrael Jan 21 '20 at 16:21
0

this solution will work

    import pandas as pd
    x=['llc', 'corp', 'sa'] 
    df = pd.DataFrame(['Geeks corp', 'toto', 'tete coope', 'tete sa', 'tata corp', 'titi', 'tmtm'] , columns =['Names'])
    for i in x:
        df["Names"] = df["Names"].str.replace(i, " ")
i_am_deesh
  • 448
  • 3
  • 12