How to remove special characters from a string before specific character?

Question

I have a df that has a column called EMAIL, which contains various email addresses. I want to remove all the special characters, specifically ., -, and _ that come before @ and append a new column NEW_EMAIL. For example, if df['EMAIL'] = 'ab_cd_123@email.com', I want df['NEW_EMAIL'] = 'abcd123@email.com'.

I was able to remove periods successfully with my codes below, but cannot seem to remove underscore or dash in the same line of code. Right now, I am repeating the same line of codes to remove those three special characters, which is quite ugly. Can someone lend me a hand please? Thank you for your help in advance.

df['NEW_EMAIL'] = df.EMAIL.str.replace(r'\.(?!.{1,4}$)','', regex = True)
df['NEW_EMAIL'] = df.NEW_EMAIL.str.replace(r'\.(?!.{1,4}$)','', regex = True)
df['NEW_EMAIL'] = df.NEW_EMAIL.str.replace(r'\.(?!.{1,4}$)','', regex = True)

Wiktor Stribiżew · Accepted Answer · 2021-05-14T19:55:02.567

You can use

df['NEW_EMAIL'] = df['EMAIL'].str.replace(r'[._-](?=[^@]*@)', '', regex=True)

See the regex demo. Details:

[._-] - a ., _ or - char
(?=[^@]*@) - a positive lookahead that requires the presence of any zero or more chars other than @ and then a @ char immediately to the right of the current location.

If you need to replace/remove any special char, you should use

df['NEW_EMAIL'] = df['EMAIL'].str.replace(r'[\W_](?=[^@]*@)', '', regex=True)

See a Pandas test:

>>> import pandas as pd
>>> df = pd.DataFrame({'EMAIL':['ab_cd_123@email.com', 'ab_cd.12-3@email.com']})
>>> df['EMAIL'].str.replace(r'[._-](?=[^@]*@)', '', regex=True)
0    abcd123@email.com
1    abcd123@email.com
Name: EMAIL, dtype: object

@casi_cielo32 Just in case you want to remove any special char there, replace `[._-]` with `[\W_]` in the pattern. I added this version to the answer as it might turn out useful for others. — Wiktor Stribiżew, May 14 '21 at 19:53

How to remove special characters from a string before specific character?

1 Answers1

Linked