Python pandas dataframe shorten the conversion time from hex string to int

Question

My intention is to convert the whole dataframe from hex string to int. Currently I able to do it based on the answer provided at pandas dataframe.apply -- converting hex string to int number

df = df.apply(lambda x: x.astype(str).map(lambda x: int(x, base=16)))

However, it runs very slow especially when the dataframe is big. I saw an answer from https://stackoverflow.com/a/52855646/5057185 saying that the lambda isn't necessary and adds overhead. I tried to implement it but I got this error.

df2 = pd.read_csv(path+temp_file, dtype=str)
df2 = df2.dropna()
df2 = df2.apply(int,base=16)

df2 = df2.apply(int,base=16) Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 6487, in apply return op.get_result() File "C:\Python27\lib\site-packages\pandas\core\apply.py", line 151, in get_result return self.apply_standard() File "C:\Python27\lib\site-packages\pandas\core\apply.py", line 257, in apply_standard self.apply_series_generator() File "C:\Python27\lib\site-packages\pandas\core\apply.py", line 286, in apply_series_generator results[i] = self.f(v) File "C:\Python27\lib\site-packages\pandas\core\apply.py", line 78, in f return func(x, *args, **kwds) TypeError: ("int() can't convert non-string with explicit base", u'occurred at index POWERON')

I believe this error is due to the dtype of the dataframe is object instead of string and this problem is known and solved in the newer version of pandas, pd.read_csv(path+temp_file, dtype="string"). I am using the old version of pandas. How can I workaround this or any other method to convert dataframe faster?

Can you use the converters parameter of read_csv ? Not sure what version it was implemented — GhandiFloss, Sep 09 '20 at 09:01

jezrael · Accepted Answer · 2020-09-09T09:05:39.870

1

I think you need DataFrame.applymap for elementwise processing:

df2 = df2.applymap(lambda x: int(x,base=16))

Another idea is reshape by DataFrame.stack and Series.unstack:

df2 = df2.stack().apply(lambda x: int(x, 16)).unstack()

edited Sep 09 '20 at 09:05

answered Sep 09 '20 at 08:56

jezrael

822,522
95
1,334
1,252

Python pandas dataframe shorten the conversion time from hex string to int

1 Answers1