0

I am trying to sort dataframe column values in conjunction with value_count -

Below is a code snippet of my algorithm:

with open (f_out_txt_2, 'w', encoding='utf-8') as f_txt_out_2:
    f_txt_out_2.write(f"SORTED First Names w/SORTED value counts:\n")
    for val, cnt in df['First Name'].value_counts(sort='True').iteritems():
        f_txt_out_2.write("\n{0:9s}  {1:2d}".format(val, cnt))

Below is the first few lines of output - note that "First Name" values are not in alphabetic order. How can I get the "First Name" values sorted while keeping value counts sorted?

Output:
SORTED First Names w/SORTED value counts:

Marilyn    11
Todd       10
Jeremy     10
Barbara    10
Sarah       9
Rose        9
Kathy       9
Steven      9
Irene       9
Cynthia     9
Carl        8
Alice       8
Justin      8
Bobby       8
Ruby        8
Gloria      8
Julie       8
Clarence    8
Harry       8
Andrea      8

.... Unfortunately I can't find the original source link of where I downloaded the "employee.csv" file from, but here is a sample of it to give an idea of what it contained:

enter image description here

1 Answers1

0

I believe you would use the following code to sort by first name, then by value counts.

dfg = df.groupby('First Name').agg(value_count = ('First Name','count')).sort_values(by = ['First Name','value_count'], ascending = [True,False])
rhug123
  • 7,893
  • 1
  • 9
  • 24
  • There is no column named "value counts" so this results: Exception has occurred: KeyError 'value counts' – user1995818 Jul 04 '20 at 12:18
  • I thought there was already a column named value counts. I edited the code, please try it now and see if it works. – rhug123 Jul 07 '20 at 00:11
  • Thanks user13802115 - Below is how I implemented your above code. The output has the First Names sorted but the count is not group sorted. The problem may be in the way I implemented your code above - – user1995818 Jul 08 '20 at 02:36
  • dfg = df.groupby('First Name').agg(value_count = ('First Name','count')).sort_values(by = ['First Name','value_count'], ascending = [True,False]) for a, b in dfg.iteritems(): for val, cnt in b.iteritems(): f_txt_out_2.write("\n{0:9s} {1:2d}".format(val, cnt)) – user1995818 Jul 08 '20 at 02:45
  • output is like this - Aaron 4 Adam 4 Alan 3 Albert 6 Alice 8 Amanda 6 Amy 5 Andrea 8 Andrew 2 ... – user1995818 Jul 08 '20 at 02:47