0

I am classifying movies by genres . ( Action Adventure SciFi, Thriller Horror Action ,...) so on. I get 200 classes and of that 50 classes have only one value when I groupby. I want to rename each of these rows by value (or occurence=1 each) and rename them as 'Other' so that the other count will be 50 now

Please advise on the code .

dataframe is df and column name is genre

thanks

tom786
  • 3
  • 2

1 Answers1

0

You could compute the frequency and use np.where to replace like this:

# compute the frequency:
counts = df.groupby('genre').transform('size')

# maps:
df['new_genre'] = np.where(counts > 1, df['genre'], 'Other')
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thanks Quang, I tried what you said,TypeError: Transform function invalid for data types , got this error ; then tried tommy=df.Genre.value_counts() df['GenreN'] = np.where((tommy>1), df['Genre'], 'Others') , got the following error:ValueError: operands could not be broadcast together with shapes (207,) (1000,) () – tom786 Aug 05 '20 at 05:46
  • s=df.Genre.value_counts() df['Genre'] = np.where(df['Genre'].isin(s.index[s <=1]), 'Other', df['Genre']) this works, what does s.index , .isin do ? – tom786 Aug 05 '20 at 05:56