pandas column datatype : object to int - Error: float' object has no attribute 'replace'

Question

Sample data in pandas dataframe:

df['Revenue']


0                $0.00
1       $12,982,681.00
2                  NaN
3       $10,150,623.00
4                  NaN
             ...      
1713               NaN
1714               NaN
1715               NaN
1716               NaN
1717               NaN
Name: Revenue, Length: 1718, dtype: object

I need to change the column from currency format to integer so that I can run computations and aggregations.

# Fix format currency
if df['Revenue'].dtype == 'object':

    df['Revenue'] = df['Revenue'].apply(lambda x: x.replace('$','')).apply(lambda x: x.replace(',','')).astype(np.int64)

When I run the above line of code to transform the datatype, I run into the following error:

  3 # Fix format currency
  4 if df['Revenue'].dtype == 'object':
  5     df['Revenue'] = df['Revenue'].apply(lambda x: x.replace('$','')).apply(lambda x: x.replace(',','')).astype(np.int64)
 

  AttributeError: 'float' object has no attribute 'replace'

wwnde · Answer 1 · 2021-11-12T22:30:15.953

1

You can try replace everything eexcept digits and the dot. If you are reading in file as csv, you can have this controlled at the read stage.

 df['Revenue'].fillna(0).astype(str).replace('[^0-9\.]','', regex=True).str.split('\.').str[0].astype(int)



    Revenue
0            0
1     12982681
2            0
3     10150623
4            0
1713         0
1714         0
1715         0
1716         0
1717         0

edited Nov 12 '21 at 22:30

answered Nov 12 '21 at 21:57

wwnde

26,119
6
18
32

how can you control at read stage? can it be done in `pd.read_csv`? – kms Nov 12 '21 at 21:58
They don't need the dot either - they're converting to int. You'd also need to do the int conversion at some point. – gshpychka Nov 12 '21 at 21:59
Look at https://stackoverflow.com/questions/36320317/pandas-read-csv-ignore-dollar-sign-when-parsing-numbers – wwnde Nov 12 '21 at 22:03
@gshpychka he can use the following if he needs them as interger `df['Revenue'].astype(str).replace('[^0-9\.]','', regex=True).str.split('\.').str[0]` – wwnde Nov 12 '21 at 22:07
@wwnde This doesn't change the column dtype though. It is still `object`, I need to convert the dtype to `int`. – kms Nov 12 '21 at 22:08
Yup ^, and I'm pretty sure `.str[0]` wouldn't work like that on a Series. – gshpychka Nov 12 '21 at 22:09
@wwnde @gshpychka there are `NaNs` in the column as well that needs to handled . Applying regex replaces `NaNs` with ` `. – kms Nov 12 '21 at 22:18
1

Use `fillna()` for those. – gshpychka Nov 12 '21 at 22:19
Does this help @kms ?`df['Revenue'].fillna(0).astype(str).replace('[^0-9\.]','', regex=True).str.split('\.').str[0].astype(int)` – wwnde Nov 12 '21 at 22:28

pandas column datatype : object to int - Error: float' object has no attribute 'replace'

1 Answers1