1

so Im trying to transforming this values in a float to be able to sum(). The problem is there is something weird that wont let me accomplish it

Data:

cw= pd.DataFrame({ "campaign": "151515151515" , 
                   "Media_Cost":  "$ 14,52" })


cw.dtypes

Media_Cost       object

My attempts, I tried all lines of code bellow, one at the time, neither works mysteriously..

cw["Media_Cost"] = cw["Media_Cost"].str.replace('$','')

# Attempt 1
cw.Media_Cost = cw.Media_Cost.astype(float)

# Attempt 3
cw.Media_Cost = len(float(cw.Media_Cost))

# Attempt 4
cw.Media_Cost = cw.Media_Cost.apply(lambda x: float(cw.Media_Cost))

Error persist..

cw["Media_Cost"] = cw["Media_Cost"].str.replace('$','').str.replace(',', '.').astype(float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-382-f5688d76abed> in <module>
      1 # cw.Media_Cost = cw.Media_Cost.apply(lambda x: float(cw.Media_Cost))
----> 2 cw["Media_Cost"] = cw["Media_Cost"].str.replace('$','').str.replace(',', '.').astype(float)
      3 
      4 # cw.Media_Cost = float(cw.Media_Cost)
      5 

~\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5689             # else, only a single dtype is given
   5690             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691                                          **kwargs)
   5692             return self._constructor(new_data).__finalize__(self)
   5693 

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, **kwargs)
    529 
    530     def astype(self, dtype, **kwargs):
--> 531         return self.apply('astype', dtype=dtype, **kwargs)
    532 
    533     def convert(self, **kwargs):

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    393                                             copy=align_copy)
    394 
--> 395             applied = getattr(b, f)(**kwargs)
    396             result_blocks = _extend_blocks(applied, result_blocks)
    397 

~\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    532     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    533         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534                             **kwargs)
    535 
    536     def _astype(self, dtype, copy=False, errors='raise', values=None,

~\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    631 
    632                     # _astype_nansafe works fine with 1-d only
--> 633                     values = astype_nansafe(values.ravel(), dtype, copy=True)
    634 
    635                 # TODO(extension)

~\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
    700     if copy or is_object_dtype(arr) or is_object_dtype(dtype):
    701         # Explicit copy, or required since NumPy can't view from / to object.
--> 702         return arr.astype(dtype, copy=True)
    703 
    704     return arr.view(dtype)

ValueError: could not convert string to float: '1.443.48'
Peter
  • 544
  • 5
  • 20
  • Yes, in Python ` '1,443.48'` does not represent a float. Float literals cannot have thousands seperators. Just keep cleaning your strings – juanpa.arrivillaga Dec 16 '19 at 17:45
  • @juanpa.arrivillaga what function would I use to accomplish that – Peter Dec 16 '19 at 18:02
  • 1
    note that you could use `babel.numbers.parse_decimal` with (e.g.) a German locale to handle localised numbers like that. going to be much slower than "C format" floats which is what Python expects by default – Sam Mason Dec 16 '19 at 18:04
  • 1
    @Peter just noticed that you're from Portugal and that uses the same comma's for decimal point like Germany, so you could use `locale='pt'` with babel – Sam Mason Dec 16 '19 at 18:07
  • 1
    @SamMason yes I was looking at `'1.443.48'` I see the problem now. Anyway, probably then you must use something like: https://stackoverflow.com/questions/40717037/how-to-convert-euro-currency-string-to-float-number – juanpa.arrivillaga Dec 16 '19 at 18:10
  • 1
    @juanpa.arrivillaga yup, `1.443.48` is indeed broken! this question looks very similar to https://stackoverflow.com/a/22137890/1358308 now, OP should strip of `$`s and then follow that answer – Sam Mason Dec 16 '19 at 18:15

1 Answers1

1

You can try:

cw = pd.DataFrame({"campaign": "151515151515", "Media_Cost":  "$ 1,443.48" }, index=[0])
cw["Media_Cost"] = cw["Media_Cost"].str.replace('$','').str.replace(',', '').astype(float)
cw.dtypes

Result:

campaign       object
Media_Cost    float64
dtype: object
René
  • 4,594
  • 5
  • 23
  • 52