1

Sample data in pandas dataframe:

df['Revenue']


0                $0.00
1       $12,982,681.00
2                  NaN
3       $10,150,623.00
4                  NaN
             ...      
1713               NaN
1714               NaN
1715               NaN
1716               NaN
1717               NaN
Name: Revenue, Length: 1718, dtype: object    

I need to change the column from currency format to integer so that I can run computations and aggregations.

# Fix format currency
if df['Revenue'].dtype == 'object':

    df['Revenue'] = df['Revenue'].apply(lambda x: x.replace('$','')).apply(lambda x: x.replace(',','')).astype(np.int64)

When I run the above line of code to transform the datatype, I run into the following error:

  3 # Fix format currency
  4 if df['Revenue'].dtype == 'object':
  5     df['Revenue'] = df['Revenue'].apply(lambda x: x.replace('$','')).apply(lambda x: x.replace(',','')).astype(np.int64)
 

  AttributeError: 'float' object has no attribute 'replace'
kms
  • 1,810
  • 1
  • 41
  • 92

1 Answers1

1

You can try replace everything eexcept digits and the dot. If you are reading in file as csv, you can have this controlled at the read stage.

 df['Revenue'].fillna(0).astype(str).replace('[^0-9\.]','', regex=True).str.split('\.').str[0].astype(int)



    Revenue
0            0
1     12982681
2            0
3     10150623
4            0
1713         0
1714         0
1715         0
1716         0
1717         0
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • how can you control at read stage? can it be done in `pd.read_csv`? – kms Nov 12 '21 at 21:58
  • They don't need the dot either - they're converting to int. You'd also need to do the int conversion at some point. – gshpychka Nov 12 '21 at 21:59
  • Look at https://stackoverflow.com/questions/36320317/pandas-read-csv-ignore-dollar-sign-when-parsing-numbers – wwnde Nov 12 '21 at 22:03
  • @gshpychka he can use the following if he needs them as interger `df['Revenue'].astype(str).replace('[^0-9\.]','', regex=True).str.split('\.').str[0]` – wwnde Nov 12 '21 at 22:07
  • @wwnde This doesn't change the column dtype though. It is still `object`, I need to convert the dtype to `int`. – kms Nov 12 '21 at 22:08
  • Yup ^, and I'm pretty sure `.str[0]` wouldn't work like that on a Series. – gshpychka Nov 12 '21 at 22:09
  • @wwnde @gshpychka there are `NaNs` in the column as well that needs to handled . Applying regex replaces `NaNs` with ` `. – kms Nov 12 '21 at 22:18
  • 1
    Use `fillna()` for those. – gshpychka Nov 12 '21 at 22:19
  • Does this help @kms ?`df['Revenue'].fillna(0).astype(str).replace('[^0-9\.]','', regex=True).str.split('\.').str[0].astype(int)` – wwnde Nov 12 '21 at 22:28