Pandas - Remove strings from a float number in a column

Question

I have a dataframe like the following:

plan type  hour status     code
A    cont   0    ok       010.0
A    cont   2    ok      025GWA
A    cont   0    notok   010VVT
A    cont   0    other     6.05
A    vend   1    ok        6.01

The column code has a few string characters with different letters. In the end I would like to transform the 'code' column to float. I tried:

df['code'] = df['code'].str.extract('(\d+)').astype(float)

but with this I got:

plan type  hour status     code
A    cont   0    ok        10.0
A    cont   2    ok        25.0 
A    cont   0    notok     10.0
A    cont   0    other      6.0
A    vend   1    ok         6.0

How can I get a result like the following?

plan type  hour status     code
A    cont   0    ok       10.00
A    cont   2    ok       25.00
A    cont   0    notok    10.00
A    cont   0    other     6.05
A    vend   1    ok        6.01

score 6 · Answer 1 · answered Jan 18 '18 at 10:56

Instead of than extraction, you could consider a substitution-based approach.

Use str.replace, and then convert to float with an astype/to_numeric conversion.

 df.code.str.replace('[^\d.]', '').astype(float)

Or,

pd.to_numeric(df.code.str.replace('[^\d.]', ''), errors='coerce')

0    10.00
1    25.00
2    10.00
3     6.05
4     6.01
Name: code, dtype: float64

score 3 · Accepted Answer · answered Jan 18 '18 at 10:59

3

Use (\d*\.?\d*)

In [441]: df['code'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
Out[441]:
0    10.00
1    25.00
2    10.00
3     6.05
4     6.01
Name: code, dtype: float64

answered Jan 18 '18 at 10:59

Zero

74,117
18
147
154

I am having the same problem apart as I have `%` on the end of some of my floats, however it hasn't seemed to have removed it. – geds133 Nov 20 '18 at 09:42

Pandas - Remove strings from a float number in a column

2 Answers2