37

How do I retrive the top 3 list from a dictionary?

>>> d
{'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4}

Expected result:

and: 23
only: 21
this: 14
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
shantanuo
  • 31,689
  • 78
  • 245
  • 403

4 Answers4

54

Use collections.Counter:

>>> d = Counter({'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4})
>>> d.most_common()
[('and', 23), ('only.', 21), ('this', 14), ('test', 4), ('a', 2), ('is', 2), ('work', 2), ('will', 2), ('as', 2)]
>>> for k, v in d.most_common(3):
...     print '%s: %i' % (k, v)
... 
and: 23
only.: 21
this: 14

Counter objects offer various other advantages, such as making it almost trivial to collect the counts in the first place.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
29
>>> d = {'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4}
>>> t = sorted(d.iteritems(), key=lambda x:-x[1])[:3]

>>> for x in t:
...     print "{0}: {1}".format(*x)
... 
and: 23
only.: 21
this: 14
Maria Zverina
  • 10,863
  • 3
  • 44
  • 61
  • 2
    Agree that Counter is better way to go if you want to count things as well. But if you just want the top 3 values in an already created dict, it seems like an overkill. :) – Maria Zverina Aug 10 '12 at 13:36
  • 1
    That depends on the size of the dictionary. Sorting the dictionary is O(n log n), creating a Counter and extracting the `k` largest is only O(n log k). For large `n` and small `k` that makes the Counter option much more efficient. – Duncan Aug 10 '12 at 14:20
  • 2
    Actually, for just 3 top values, I'd use the `heapq.nlargest()` function; it is more efficient than sorting the whole sequence. That is what `Counter()` uses internally. – Martijn Pieters May 14 '13 at 08:20
4

The replies you already got are right, I would however create my own key function to use when call sorted().

d = {'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4}

# create a function which returns the value of a dictionary
def keyfunction(k):
    return d[k]

# sort by dictionary by the values and print top 3 {key, value} pairs
for key in sorted(d, key=keyfunction, reverse=True)[:3]:
    print "%s: %i" % (key, d[key])
Gianluca
  • 6,307
  • 19
  • 44
  • 65
4

Given the solutions above:

def most_popular(L):
  # using lambda
  start = datetime.datetime.now()
  res=dict(sorted([(k,v) for k, v in L.items()], key=lambda x: x[1])[-2:])
  delta=datetime.datetime.now()-start
  print "Microtime (lambda:%d):" % len(L), str( delta.microseconds )

  # using collections
  start=datetime.datetime.now()
  res=dict(collections.Counter(L).most_common()[:2])
  delta=datetime.datetime.now()-start
  print "Microtime (collections:%d):" % len(L), str( delta.microseconds )

# list of 10
most_popular({el:0 for el in list(range(10))})

# list of 100
most_popular({el:0 for el in list(range(100))})

# list of 1000
most_popular({el:0 for el in list(range(1000))})

# list of 10000
most_popular({el:0 for el in list(range(10000))})

# list of 100000
most_popular({el:0 for el in list(range(100000))})

# list of 1000000
most_popular({el:0 for el in list(range(1000000))})

Working on dataset dict of size from 10^1 to 10^6 dict of objects like

print {el:0 for el in list(range(10))}
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}

we have the following benchmarks

Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.8.2] on linux

Microtime (lambda:10): 24
Microtime (collections:10): 106
Microtime (lambda:100): 49
Microtime (collections:100): 50
Microtime (lambda:1000): 397
Microtime (collections:1000): 178
Microtime (lambda:10000): 4347
Microtime (collections:10000): 2782
Microtime (lambda:100000): 55738
Microtime (collections:100000): 26546
Microtime (lambda:1000000): 798612
Microtime (collections:1000000): 361970
=> None

So we can say that for small lists use lambda, but for huge list, collections has better performances.

See the benchmark running here.

loretoparisi
  • 15,724
  • 11
  • 102
  • 146