Rapids.ai / difference of computation with log between Pandas and cudf

Question

Here are my code for comparison between cudf and pandas performance :

gpuDF2 = cudf.DataFrame({'col_1': np.arange(0, 10_000_000), 'col_2': np.arange(0, 10_000_000)})
pandasDF2= pd.DataFrame({'col_1':np.arange(0,10_000_000), 'col_2':np.arange(0,10_000_000)})
gpuDF2['log_2'] = np.log(gpuDF2['col_1'])
pandasDF2['log_1'] = np.log(pandasDF2['col_1'])

How can I have consistency between the two computation ?

I'm unable to reproduce this in the current version of cuDF. — Nick Becker, Jun 28 '22 at 02:28
Colab only supports RAPIDS up to v21.12. You may want to try SageMaker Studio Lab if you need a free GPU to run cuDF. https://rapids.ai/start.html — Nick Becker, Jun 29 '22 at 14:18

score 1 · Answer 1 · answered Aug 06 '22 at 09:40

I can reproduce the original post, but for consistent results you will want to use cupy instead of numpy. Fixing that generates the same answer:

import cudf
import pandas as pd
import cupy

gpuDF2 = cudf.DataFrame({'col_1': np.arange(0, 10_000_000), 'col_2': np.arange(0, 10_000_000)})
pandasDF2= pd.DataFrame({'col_1':np.arange(0,10_000_000), 'col_2':np.arange(0,10_000_000)})
gpuDF2['log_2'] = cupy.log(gpuDF2['col_1'])
pandasDF2['log_1'] = np.log(pandasDF2['col_1'])

# this passes
cupy.testing.assert_array_almost_equal(pandasDF2['log_1'], gpuDF2['log_2'])

Rapids.ai / difference of computation with log between Pandas and cudf

1 Answers1