1

I have tried to research this problem, but failed. I'm quite a beginner at python, so bear with me.

I have a textfile containing numbers on each line (they are angles in degrees). I want to first cluster the angles into cluster sizes of 20. Then I want to plot this on a histogram. I have the following code:

angle = open(output_dir+'/chi_angle.txt', 'r').read().splitlines()
array = numpy.array(map(float, angle))
hello = list(array)
from cluster import *
cl = HierarchicalClustering(hello, lambda x,y: abs(x-y))
clusters = cl.getlevel(20)
frequency = [len(x) for x in clusters]
average = [1.0*sum(x)/len(x) for x in clusters]

Now. My question is: How do I plot the histogram?

Doing the following:

pylab.hist(average, bins=50)
pylab.xlabel('Chi 1 Angle [degrees]')
pylab.ylabel('#')
pylab.show()

will show a histogram with bars correctly placed (i.e. at the average of each cluster), but it wont show how many "angles" each cluster contains.

Just for clarification. The clustered data looks like this:

clusters = [[-60.26, -30.26, -45.24], [163.24, 173.24], [133.2, 123.23, 121.23]]

I want the mean of each cluster, and the number of angles in each cluster. On the histogram the first bar will thus be located at around -50 and will be a height of 3. How do I plot this?

Thanks a lot!

2 Answers2

1

Not sure I understood your question. Anyhow try saving your histogram in this array

 H=hist(average, bins=50)

If you want to plot it then do

 plot(H[1][1:],H[0])

H[1] is an array that stores the bins centers and H[0] the counts in each bin. I hope this helped.

Brian
  • 13,996
  • 19
  • 70
  • 94
0

Why don't you just use a histogram right away?

A histogram of cluster centers is not a very sensible representation of your data.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • I know what you mean, but I need it because I will have 300+ angles. And I just want to get an idea of what confirmation (it is a protein) the specific amino acid has. – user1572691 Aug 15 '12 at 15:57
  • For a histogram, 300+ angles does not make a difference. That's *exactly* what you use histograms for, large samples when you are only interested in the overall distribution. Each Histogram bin *collects* a number of angles... Did you *try* doing an angle histogram? What *exactly* is wrong with the histogram? If you want a simpler histogram, use fewer bins. – Has QUIT--Anony-Mousse Aug 15 '12 at 22:39
  • I see what you mean now. Using fewer bins worked, thanks a lot! – user1572691 Aug 16 '12 at 15:23