1

I have created a pandas dataframe called df using this code:

import numpy as np import pandas as pd

ds = {'col1' : ['1','3/','4'], 'col2':['A','!B','@C']}

df =pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)

  col1 col2
0    1    A
1   3/   !B
2    4   @C

The columns contain some special characters (/ and @) that I need to replace with a blank space.

Now, I have a list of special characters:

listOfSpecialChars = '¬`!"£$£#/,.+*><@|"'

How can I replace any of the special characters listed in listOfSpecialChars with a blank space, any time I encounter them at any point in a dataframe, for any columns? At the moment I am dealing with a 100K-record dataframe with 560 columns, so I can't write a piece of code for each variable.

Giampaolo Levorato
  • 1,055
  • 1
  • 8
  • 22

2 Answers2

1

You can use apply with str.replace:

import re
chars = ''.join(map(re.escape, listOfSpecialChars))

df2 = df.apply(lambda c: c.str.replace(f'[{chars}]', '', regex=True))

Alternatively, stack/unstack:

df2 = df.stack().str.replace(f'[{chars}]', '', regex=True).unstack()

output:

  col1 col2
0    1    A
1    3    B
2    4    C
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Also: `df.apply(lambda v, table=str.maketrans('', '', listOfSpecialChars): v.str.translate(table))` seems a reasonable option – Jon Clements Jun 15 '22 at 12:20
  • thanks for the reply. The issue I have got though, is that the dataset might also contain some numeric columns, whose values get replaced with NaN. – Giampaolo Levorato Jun 15 '22 at 12:42
  • 1
    @GiampaoloLevorato you can restore them with `df2 = df2.combine_first(df1)`, or apply the transform only on the non-numeric columns – mozway Jun 15 '22 at 12:47
0
## Removes everything except letters, numbers, dash, and underscore. 
df['data'] = df['data'].str.replace(r'[^A-Za-z0-9\-\_]+', '')
Yonatan Karp-Rudin
  • 1,056
  • 8
  • 24
Yas
  • 1
  • 2