1

I am trying to get the most dominant/the most frequent value of a column, so I tried the following code,

df['currency'].value_counts(normalize=True)

which gives me, e.g.

USD    0.800000
CAD    0.100000
EUR    0.050000
GBP    0.050000

now the edge cases are like

USD    0.500000
CAD    0.500000

or

USD    0.333333
CAD    0.333333
CNY    0.333333

or

USD    0.400000
CAD    0.400000
CNY    0.100000
EUR    0.100000

and so on, where the frequencies are even among all values or part of the values.

Now I am trying to detect such edge cases so what is the best way to do that?

In other words, I am trying to find the most dominant frequency of some value in the series/column, in that df['currency'].value_counts().max() is not necessarily giving the most frequency, since the values given by df['currency'].value_counts() could all be the same. Hence data.df['currency'].value_counts().idxmax() won't necessarily give the index/column value having the higest frequency in the column.

daiyue
  • 7,196
  • 25
  • 82
  • 149

1 Answers1

3

Demo:

In [104]: df
Out[104]:
  currency
0      USD
1      USD
2      EUR
3      EUR
4      CAD

In [105]: df.currency.mode()
Out[105]:
0    EUR
1    USD
dtype: object

In [106]: len(df.currency.mode()) > 1
Out[106]: True

now let's "fix" our DF - by adding another row with USD currency, so there are NO "edge" cases any more:

In [107]: df.loc[len(df)] = ['USD']

In [108]: df
Out[108]:
  currency
0      USD
1      USD
2      EUR
3      EUR
4      CAD
5      USD

In [109]: len(df.currency.mode()) > 1
Out[109]: False

In [110]: df.currency.mode()
Out[110]:
0    USD
dtype: object
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • `df.loc[len(df)] = ['USD']` gave me `ValueError: cannot set a row with mismatched columns` – daiyue Sep 08 '17 at 10:50
  • @daiyue, that was just a demo - you shouldn't add rows to your DF... I just wanted to show the different output of `df.currency.mode()` when there are no "edge" cases... – MaxU - stand with Ukraine Sep 08 '17 at 10:53