Replace special characters in pandas dataframe from a string of special characters

Question

I have created a pandas dataframe called df using this code:

import numpy as np import pandas as pd

ds = {'col1' : ['1','3/','4'], 'col2':['A','!B','@C']}

df =pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)

  col1 col2
0    1    A
1   3/   !B
2    4   @C

The columns contain some special characters (/ and @) that I need to replace with a blank space.

Now, I have a list of special characters:

listOfSpecialChars = '¬`!"£$£#/,.+*><@|"'

How can I replace any of the special characters listed in listOfSpecialChars with a blank space, any time I encounter them at any point in a dataframe, for any columns? At the moment I am dealing with a 100K-record dataframe with 560 columns, so I can't write a piece of code for each variable.

do you really want to replace with a space, or to delete? – mozway Jun 15 '22 at 12:01 — mozway, Jun 15 '22 at 12:01
replace with a blank or delete, either way is fine to me – Giampaolo Levorato Jun 15 '22 at 12:04 — Giampaolo Levorato, Jun 15 '22 at 12:04

score 1 · Accepted Answer · answered Jun 15 '22 at 12:04

1

You can use apply with str.replace:

import re
chars = ''.join(map(re.escape, listOfSpecialChars))

df2 = df.apply(lambda c: c.str.replace(f'[{chars}]', '', regex=True))

Alternatively, stack/unstack:

df2 = df.stack().str.replace(f'[{chars}]', '', regex=True).unstack()

output:

  col1 col2
0    1    A
1    3    B
2    4    C

answered Jun 15 '22 at 12:04

mozway

194,879
13
39
75

Also: `df.apply(lambda v, table=str.maketrans('', '', listOfSpecialChars): v.str.translate(table))` seems a reasonable option – Jon Clements Jun 15 '22 at 12:20
thanks for the reply. The issue I have got though, is that the dataset might also contain some numeric columns, whose values get replaced with NaN. – Giampaolo Levorato Jun 15 '22 at 12:42
1

@GiampaoloLevorato you can restore them with `df2 = df2.combine_first(df1)`, or apply the transform only on the non-numeric columns – mozway Jun 15 '22 at 12:47

score 0 · Answer 2 · edited Nov 21 '22 at 13:34

0

## Removes everything except letters, numbers, dash, and underscore. 
df['data'] = df['data'].str.replace(r'[^A-Za-z0-9\-\_]+', '')

edited Nov 21 '22 at 13:34

Yonatan Karp-Rudin

1,056
8
24

answered Nov 14 '22 at 23:15

Yas

1
2

Replace special characters in pandas dataframe from a string of special characters

2 Answers2