I have a modin dataframe having ~120k rows. I want coalesce some columns of it. Modin df iterrows is taking lot of time, so I tried with numpy.where. Numpy.where is on the equivalent pandas df does it in 5-10 minutes but same thing on modin df takes ~30 minutes. Any alternative to speed this task for modin dataframe?
[cols_to_be_coalesced] --> this list contains list of columns to be coalesced. It contains 10-15 columns.
Code:
for COL in [cols_to_be_coalesced]:
df['COL'] = np.where(df['COL']!='', df['COL'], df['COL_X'])
If df is pandas dataframe, it executes in ~10 minutes, but if its a modin dataframe, it takes ~30 minutes. So is there a any equivalent code for numpy.where for modin dataframes to speed up this operation?