2

I have DataFrame like below in Python Pandas ("col1" is data type string):

col1
-----
1234AABY332
857363opx00C*+
9994TyF@@@!
...

And I need to remove all special characters like: ["-", ",", ".", ":", "/", "@", "#", "&", "$", "%", "+", "*", "(", ")", "=", "!", "", "~", "~"] and letters (both large and small) like for example: A, a, b, c and so one...

so as a result I need DataFrame like below:

col1
-----
1234332
85736300
9994
...

How can I do that in Python Pandas ?

dingaro
  • 2,156
  • 9
  • 29

2 Answers2

3

I might phrase your requirement as removing all non digit characters:

df["col1"] = df["col1"].str.replace(r'\D+', '', regex=True)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

You can also use findall to extract digit only:

df['col1'] = df['col1'].str.findall(r'(\d)').str.join('')
print(df)

# Output
       col1
0   1234332
1  85736300
2      9994

You can append .astype(int) to convert digits to a number:

Corralien
  • 109,409
  • 8
  • 28
  • 52