4

I have a dictionary with {key: count}, say status_count = {'MANAGEMENT ANALYSTS': 13859, 'COMPUTER PROGRAMMERS': 72112} and I am trying to write a key function for heapq.nlargest() that sorts based on count and if there are ties I have to sort based on alphabetical order(a-z) of keys. I have to use heapq.nlargest because of very large N and small k = 10.

This is what I got until now,

top_k_results = heapq.nlargest(args.top_k, status_count.items(), key=lambda item: (item[1], item[0])) But, this would be incorrect in case of breaking ties with alphabetical order. Please help!

jpp
  • 159,742
  • 34
  • 281
  • 339
pulsar
  • 141
  • 2
  • 13

1 Answers1

1

Simplest may be to switch to heapq.nsmallest and redefine your sort key:

from heapq import nsmallest

def sort_key(x):
    return -x[1], x[0]

top_k_results = nsmallest(args.top_k, status_count.items(), key=sort_key)

Alternatively, you can use ord and take the negative for ascending order:

from heapq import nlargest

def sort_key(x):
    return x[1], [-ord(i) for i in x[0]]

top_k_results = nlargest(args.top_k, status_count.items(), key=sort_key)

Remember to use str.casefold if you need to normalize the case of your string.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thank you for the answer. I have strings with length > 1 and `ord()` would accept only strings of length 1 I guess. Is there a way to overcome this? – pulsar Oct 30 '18 at 10:43