0

Hi I am working on a dataset where there is a host_id and two other columns : reviews_per_month and number_of_reviews. For every host_id, majority of the values are present for these two columns whereas some of them are zeros. For each column, I want to replace those 0 values by the mean of all the values related with that host_id. Here is the code I have tried :

def process_rpm_nor(data):
    data['reviews_per_month'] = data['reviews_per_month'].fillna(0)
    data['number_of_reviews'] = data['number_of_reviews'].fillna(0)
    data_list = []

    for host_id in set(data['host_id']):
        data_temp = data[data['host_id'] == host_id]

        nor_non_zero = np.mean(data_temp[data_temp['number_of_reviews'] > 0]['number_of_reviews'])
        rpm_non_zero = np.mean(data_temp[data_temp['reviews_per_month'] > 0]['reviews_per_month'])
        data_temp['number_of_reviews'] = data_temp['number_of_reviews'].replace(0,nor_non_zero) 
        data_temp['reviews_per_month'] = data_temp['reviews_per_month'].replace(0,rpm_non_zero)

        data_list.append(data_temp)
    
    return pd.concat(data_list, axis = 1)

Though the code works, yet it takes a lot of time to process and I was wondering if anyone could help by offering an alternate solution to this problem or help me optimize my code. I'd really appreciate the help.

user47
  • 395
  • 1
  • 2
  • 16
  • Does this answer your question? [Python/Pandas Dataframe replace 0 with median value](https://stackoverflow.com/questions/37506488/python-pandas-dataframe-replace-0-with-median-value) – Amin Rashidbeigi Nov 05 '20 at 20:57
  • 2
    I'd say you want to groupby host_id rather than iterating over it. If you can provide sample data and a the desired result for testing I think an answer is not far away. – B. Bogart Nov 05 '20 at 20:59
  • No that basically replaces 0's with median of all the values. But I am trying to replace 0 with average of only those values which are associated with the same host_id, and not the entire data. – user47 Nov 05 '20 at 20:59
  • Thanks @B.Bogart, I found a solution using your suggestion :) – user47 Nov 05 '20 at 21:12
  • 1
    @user47 - Can you post that solution including some initial data here so others will know in the future? – tdelaney Nov 05 '20 at 21:40
  • Great! Post the solution for all to see!! – B. Bogart Nov 06 '20 at 01:25

0 Answers0