0

I found a problem in completing this python pandas exercise. I am asked to write a pandas program to display most frequent value in a random series and replace everything else as 'Other' in the series.

I can't do it in a series but kind of OK in a dataframe. The code is shown below. My problem is: I can only select the value from the first index of the frequency counts list (i.e. index[0]), but it implies there is only one mode. What if there are more than one mode?

Grateful with your help. Thank you!

data19 = np.random.randint(46,50,size = 10)
df19 = pd.DataFrame(data19, columns = ["integer"])
print(df19)
freq19 = df19["integer"].value_counts()
print(freq19)
find_mode = df19["integer"] == freq19.index[0]  #What if there are more than one mode?
df19.loc[~find_mode, "integer"] = "Other"
print(df19)
ronzenith
  • 341
  • 3
  • 11

1 Answers1

1

Using Series.mode:

mode = df19['integer'].mode()
df19.loc[~df19['integer'].isin(mode), 'integer'] = 'Other'
BigBen
  • 46,229
  • 7
  • 24
  • 40
  • Thanks! It works! but I can't quite understand the meaning of the code. In the second line, I'm checking the integer column with the mode dataframe, but what is the use of second 'integer' of that line? – ronzenith Apr 07 '23 at 16:08
  • 1
    It's the second parameter of `loc`, i.e. the same as the `"integer"` in `df19.loc[~find_mode, "integer"] = "Other"` – BigBen Apr 07 '23 at 16:10
  • See. Let me digest. – ronzenith Apr 07 '23 at 16:13