0

I'am trying to change nan values of item_price to the mean value based on item_id in the following dask dataframe:

all_data['item_price'] = all_data[['item_id','item_price']].groupby('item_id')['item_price'].apply(lambda x: x.fillna(x.mean()))

All_data.head()

Unfortunately I get the following error:

ValueError: cannot reindex from a duplicate axis

Any idea how to avoid this error or any other way to change nan values to mean values for a dask dataframe?

mj1261829
  • 1,200
  • 3
  • 26
  • 53

1 Answers1

0

I found a solution to the problem. Fillna along with map can be used instead:

all_data['item_price'] = all_data['item_price'].fillna(
    all_data['item_id'].map(
        all_data.groupby('item_id')['item_price'].mean().compute()
    )
)

This gets rid of the duplicate axis problem. Beware you have to use compute as seen in the code inside the map function for it to work without an error.

mdurant
  • 27,272
  • 5
  • 45
  • 74
mj1261829
  • 1,200
  • 3
  • 26
  • 53