18

I have a dataset with 1000s of elements and their respective frquencies. i need to plot a histogram of the top 10 occurring elements.
i did:

  top_words = Counter(my_data).most_common()  
  top_words_10 = top_words[:10]  
  plt.hist(top_words_10,label='True')    

and got this error :

TypeError                                   
  Traceback (most recent call last) 
<ipython-input-29-ff974b3a2354> in <module>()  
      5  print top_words[:10]  
      6   
----> 7 plt.hist(top_words_10)    
C:\Anaconda\lib\site-packages\numpy\core\_methods.pyc in _amin(a, axis, out, keepdims)  
     12 def _amin(a, axis=None, out=None, keepdims=False):  
     13     return um.minimum.reduce(a, axis=axis,  
---> 14                             out=out, keepdims=keepdims)  
     15   
     16 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):  


TypeError: cannot perform reduce with flexible type

Any idea?? my data looks like this :

[(' whitefield', 65299), (' bellandur', 57061), (' kundalahalli', 51769), (' marathahalli', 50639), (' electronic city', 44041), (' sarjapur road junction', 34164), (' indiranagar 2nd stage', 32459), (' malleswaram', 32171), (' yelahanka main road', 28901), (' domlur', 28869)]
Hypothetical Ninja
  • 3,920
  • 13
  • 49
  • 75

2 Answers2

28

You get this error because you need to convert your data to a numeric type. Your array contains strings.

import matplotlib.pyplot as plt
import numpy as np

data = [(' whitefield', 65299), (' bellandur', 57061), (' kundalahalli', 51769), (' marathahalli', 50639),
(' electronic city', 44041), (' sarjapur road junction', 34164), (' indiranagar 2nd stage', 32459),
(' malleswaram', 32171), (' yelahanka main road', 28901), (' domlur', 28869)]

freequency = []
words = []

for line in data:
    freequency.append(line[1])
    words.append(line[0])

y_axis = np.arange(1, len(words) + 1, 1)

plt.barh(y_axis, freequency, align='center')
plt.yticks(y_axis, words)
plt.show()
Vlad Sonkin
  • 395
  • 3
  • 4
  • thanx a ton ..it works awesome.. how could i interchange the axis??? i want it in vertical fashion.. and pleaase explain the for loop.. – Hypothetical Ninja Jan 31 '14 at 12:41
  • See this answer [link](http://stackoverflow.com/questions/17074772/using-text-on-y-axis-in-matplotlib-instead-of-numbers). In for loop I just parse list by tuples: `(' whitefield', 65299)` , get elements by index and store them in lists. – Vlad Sonkin Jan 31 '14 at 19:09
  • There should be some chart that takes a list of categorical values as input and plots the counts. This should be something out of the box, its the simplest chart one can imagine. – ksha May 13 '17 at 09:44
0

The problem is that plt.hist tries to use nmupy.hist to make a histogram from the data you pass in.

You want to just use bar

import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
words, counts = zip(*data32)  # unpack pairs into two lists
ax.bar(range(len(counts)), words, align='center')
ax.set_xticks(range(len(counts))
ax.set_xticklabels(words)  # this is about the _only_ use for set_xticklabels
plt.draw

See this example and the documentation.

tacaswell
  • 84,579
  • 22
  • 210
  • 199
  • hi..thnx but i got an error File "", line 6 ax.set_xticklabels(words) # this is about the _only_ use for set_xticklabels ^ SyntaxError: invalid syntax – Hypothetical Ninja Jan 31 '14 at 12:52
  • where is it missing?? sorry , but i'm a newbie, so all this registers for the first tym ;) – Hypothetical Ninja Jan 31 '14 at 12:58
  • In set_xticks. learning to effectively debug is the most important skill when learning to program. The syntax errors are nice, they tell you where they are to with in a few lines. – tacaswell Jan 31 '14 at 13:03