2

I have the following dictionary and am trying to make a pie chart from them but I want to only include the top 5 (they are already sorted by value here) and then sum the others together in an Other category i.e. replace Publishing, Fashion, Food etc. with just one Other which sum them all together. Stuck with how to do this so would appreciate any help!

{'Games': 715067930.8599964,
 'Design': 705237125.089998,
 'Technology': 648570433.7599969,
 'Film & Video': 379559714.56000066,
 'Music': 191227757.8699999,
 'Publishing': 130763828.65999977,
 'Fashion': 125678824.47999984,
 'Food': 122781563.58000016,
 'Art': 89078801.8599998,
 'Comics': 70600202.99999984,
 'Theater': 42662109.69999992,
 'Photography': 37709926.38000007,
 'Crafts': 13953818.35000002,
 'Dance': 12908120.519999994,
 'Journalism': 12197353.370000007}

At the moment my pie chart is really overcrowded using this code

groupbycategorypledge = df.groupby('main_category')['usd_pledged_real'].sum().sort_values(ascending=False)
plt.figure(figsize=(20, 10))
pie = groupbycategorypledge.plot(kind='pie', startangle=90, radius=0.7, title='Amount Pledged by Main category',autopct='%1.1f%%',labeldistance=1.2)
plt.legend(loc=(1.05,0.75))
plt.ylabel('')

so I have

dict = groupbycategorypledge.sort_values(ascending=False).to_dict()
jpp
  • 159,742
  • 34
  • 281
  • 339

2 Answers2

1

You can manipulate your dictionary before using Pandas:

from operator import itemgetter

# sort by value descending
items_sorted = sorted(d.items(), key=itemgetter(1), reverse=True)

# calculate sum of others
others = ('Other', sum(map(itemgetter(1), items_sorted[5:])))

# construct dictionary
d = dict([*items_sorted[:5], others])

print(d)

{'Games': 715067930.8599964,
 'Design': 705237125.089998,
 'Technology': 648570433.7599969,
 'Film & Video': 379559714.56000066,
 'Music': 191227757.8699999,
 'Other': 658334549.8999995}
jpp
  • 159,742
  • 34
  • 281
  • 339
0

Building on the idea of @jpp, but using a heap:

import heapq

d = {'Games': 715067930.8599964,
     'Design': 705237125.089998,
     'Technology': 648570433.7599969,
     'Film & Video': 379559714.56000066,
     'Music': 191227757.8699999,
     'Publishing': 130763828.65999977,
     'Fashion': 125678824.47999984,
     'Food': 122781563.58000016,
     'Art': 89078801.8599998,
     'Comics': 70600202.99999984,
     'Theater': 42662109.69999992,
     'Photography': 37709926.38000007,
     'Crafts': 13953818.35000002,
     'Dance': 12908120.519999994,
     'Journalism': 12197353.370000007}

top_5 = set(heapq.nlargest(5, d, key=d.get))

groups = {}
for category, pledge in d.items():
    new_category = category if category in top_5 else 'Other'
    groups.setdefault(new_category, []).append(pledge)

result = {k: sum(v) for k, v in groups.items()}
print(result)

Output

{'Technology': 648570433.7599969, 'Design': 705237125.089998, 'Other': 658334549.8999994, 'Games': 715067930.8599964, 'Film & Video': 379559714.56000066, 'Music': 191227757.8699999}

Or if you fancy numpy:

import numpy as np

d = {'Games': 715067930.8599964,
     'Design': 705237125.089998,
     'Technology': 648570433.7599969,
     'Film & Video': 379559714.56000066,
     'Music': 191227757.8699999,
     'Publishing': 130763828.65999977,
     'Fashion': 125678824.47999984,
     'Food': 122781563.58000016,
     'Art': 89078801.8599998,
     'Comics': 70600202.99999984,
     'Theater': 42662109.69999992,
     'Photography': 37709926.38000007,
     'Crafts': 13953818.35000002,
     'Dance': 12908120.519999994,
     'Journalism': 12197353.370000007}

categories, pledge_values = map(np.array, zip(*d.items()))
partition = np.argpartition(pledge_values, -5)
top_5 = set(categories[partition][-5:])

groups = {}
for category, pledge in d.items():
    new_category = category if category in top_5 else 'Other'
    groups.setdefault(new_category, []).append(pledge)

result = {k: sum(v) for k, v in groups.items()}
print(result)

Output

{'Technology': 648570433.7599969, 'Design': 705237125.089998, 'Other': 658334549.8999995, 'Music': 191227757.8699999, 'Games': 715067930.8599964, 'Film & Video': 379559714.56000066}

The complexity of the second proposal (using numpy) is O(n) where n is the number of key, value pairs of d.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76