Pandas concatenating cols/datatypes, invalid literal for int() with base 10

Question

in my code i try to concatenate 3 fields like so :

final['join_key'] = final.loc[:, ['endOfDay','book','bdr']].apply(lambda x: '|'.join(x.astype(str).to_numpy()), axis=1)

This works fine however I get this as my result, where I dont want 10862.0 to show, I want 10862.

20191031|10862.0|Stack|1354469

ive tried changing astype to np.float64,int,np.int64, as recommended here how to round/remove traling ".0" zeros in pandas column? , but getting this error on column "book" . The datatype is float64 for "book", rest of columns are objects.

"invalid literal for int() with base 10: '0023_TEST'", 'occurred at index 0')

Are there any work arounds to getting 10862 to delete trailing .0's ?

complete error:

runfile('V:/excelguy/scripts/output.py', wdir='V:/excelguy/scripts')
Traceback (most recent call last):

  File "<ipython-input-102-8c7ef12f36ec>", line 1, in <module>
    runfile('V:/excelguy/scripts/output.py', wdir='V:/excelguy/scripts')

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "V:/excelguy/scripts/output.py", line 63, in <module>
    final['join_key'] = final.loc[:, ['endOfDay','book','bdr']].apply(lambda x: '|'.join(x.astype(np.int64).to_numpy()), axis=1)

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 6913, in apply
    return op.get_result()

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\apply.py", line 186, in get_result
    return self.apply_standard()

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\apply.py", line 292, in apply_standard
    self.apply_series_generator()

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\apply.py", line 321, in apply_series_generator
    results[i] = self.f(v)

  File "V:/excelguy/scripts/output.py", line 63, in <lambda>
    final['join_key'] = final.loc[:, ['endOfDay','book','bdr']].apply(lambda x: '|'.join(x.astype(np.int64).to_numpy()), axis=1)

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 5882, in astype
    dtype=dtype, copy=copy, errors=errors, **kwargs

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 581, in astype
    return self.apply("astype", dtype=dtype, **kwargs)

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 438, in apply
    applied = getattr(b, f)(**kwargs)

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 559, in astype
    return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 643, in _astype
    values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)

  File "C:\Users\test\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 707, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)

  File "pandas\_libs\lib.pyx", line 547, in pandas._libs.lib.astype_intsafe

ValueError: ("invalid literal for int() with base 10: '0023_TEST'", 'occurred at index 0')

Need more info, particularly the original data structure. Whats happening from the error trace is that your astype mapping is comming across a string which would fail. Therefore, this either means that you are not applying your transform properly to the only expected column or there are strings in that column. Is there a reason post processing wouldn't work to remove the trailing decimals? How did the decimals get there in the first place. If you wanted to remove them, you could just perform the removal on read/load. — Jason Chia, Dec 02 '19 at 16:53
It looks like the column you try to change the type of contains strings? Please verify. If all entries are "integers" as floats the `df.astype(int)` function should give the desired output. — julian, Dec 02 '19 at 18:07
i try `final['join_key'].astype(int)` and get invalid literal for int() — excelguy, Dec 02 '19 at 18:35
Hi Jason, I ended up doing this , `final['join_key'].astype(str).replace('\.0','',regex=True)` . You can put yours as an answer if you wish, i think it should also work. — excelguy, Dec 05 '19 at 18:25

Pandas concatenating cols/datatypes, invalid literal for int() with base 10

0 Answers0