5

I have a dataframe like the following:

plan type  hour status     code
A    cont   0    ok       010.0
A    cont   2    ok      025GWA
A    cont   0    notok   010VVT
A    cont   0    other     6.05
A    vend   1    ok        6.01

The column code has a few string characters with different letters. In the end I would like to transform the 'code' column to float. I tried:

df['code'] = df['code'].str.extract('(\d+)').astype(float)

but with this I got:

plan type  hour status     code
A    cont   0    ok        10.0
A    cont   2    ok        25.0 
A    cont   0    notok     10.0
A    cont   0    other      6.0
A    vend   1    ok         6.0

How can I get a result like the following?

plan type  hour status     code
A    cont   0    ok       10.00
A    cont   2    ok       25.00
A    cont   0    notok    10.00
A    cont   0    other     6.05
A    vend   1    ok        6.01
Thabra
  • 337
  • 2
  • 9

2 Answers2

6

Instead of than extraction, you could consider a substitution-based approach.

Use str.replace, and then convert to float with an astype/to_numeric conversion.

 df.code.str.replace('[^\d.]', '').astype(float)

Or,

pd.to_numeric(df.code.str.replace('[^\d.]', ''), errors='coerce')

0    10.00
1    25.00
2    10.00
3     6.05
4     6.01
Name: code, dtype: float64
cs95
  • 379,657
  • 97
  • 704
  • 746
3

Use (\d*\.?\d*)

In [441]: df['code'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
Out[441]:
0    10.00
1    25.00
2    10.00
3     6.05
4     6.01
Name: code, dtype: float64
Zero
  • 74,117
  • 18
  • 147
  • 154
  • I am having the same problem apart as I have `%` on the end of some of my floats, however it hasn't seemed to have removed it. – geds133 Nov 20 '18 at 09:42