1

How do I drop pandas dataframe columns that contains special characters such as @ / ] [ } { - _ etc.?

For example I have the following dataframe (called df):

enter image description here

I need to drop the columns Name and Matchkey becasue they contain some special characters. Also, how can I specify a list of special characters based on which the columns will be dropped?

For example: I'd like to drop the columns that contain (in any record, in any cell) any of the following special characters:

listOfSpecialCharacters: ¬,`,!,",£,$,£,#,/,\

Giampaolo Levorato
  • 1,055
  • 1
  • 8
  • 22
  • Can you provide the text version of your dataset so that I can match the answer with the same data? Also, minor detail, but did you want to include `_` as character to blacklist? – mozway Apr 06 '22 at 08:47
  • Ah, never mind ! I have sorted it ! I forgot to use .str! thanks !! – Giampaolo Levorato Apr 06 '22 at 10:21

1 Answers1

1

One option is to use a regex with str.contains and apply, then use boolean indexing to drop the columns:

import re
chars = '¬`!"£$£#/\\'
regex = f'[{"".join(map(re.escape, chars))}]'
# '[¬`!"£\\$£\\#/\\\\]'

df2 = df.loc[:, ~df.apply(lambda c: c.str.contains(regex).any())]

example:

# input
     A    B    C
0  123  12!  123
1  abc  abc  a¬b

# output
     A
0  123
1  abc
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks! I get this error: AttributeError: Can only use .str accessor with string values! Looks like that code applies only to string columns. – Giampaolo Levorato Apr 06 '22 at 10:16
  • You can either do `c.astype(str).str.contains(regex).any()` of apply it only on the str/object columns using [`select_dtypes`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html) – mozway Apr 06 '22 at 10:56