14

I am using matplotlib.pyplot to create histograms. I'm not actually interested in the plots of these histograms, but interested in the frequencies and bins (I know I can write my own code to do this, but would prefer to use this package).

I know I can do the following,

import numpy as np
import matplotlib.pyplot as plt

x1 = np.random.normal(1.5,1.0)
x2 = np.random.normal(0,1.0)

freq, bins, patches = plt.hist([x1,x1],50,histtype='step')

to create a histogram. All I need is freq[0], freq[1], and bins[0]. The problem occurs when I try and use,

freq, bins, patches = plt.hist([x1,x1],50,histtype='step')

in a function. For example,

def func(x, y, Nbins):
    freq, bins, patches = plt.hist([x,y],Nbins,histtype='step') # create histogram

    bincenters = 0.5*(bins[1:] + bins[:-1]) # center bins

    xf= [float(i) for i in freq[0]] # convert integers to float
    xf = [float(i) for i in freq[1]]

    p = [ (bincenters[j], (1.0 / (xf[j] + yf[j] )) for j in range(Nbins) if (xf[j] + yf[j]) != 0]

    Xt = [j for i,j in p] # separate pairs formed in p
    Yt = [i for i,j in p]

    Y = np.array(Yt) # convert to arrays for later fitting
    X = np.array(Xt)

    return X, Y # return arrays X and Y

When I call func(x1,x2,Nbins) and plot or print X and Y, I do not get my expected curve/values. I suspect it something to do with plt.hist, since there is a partial histogram in my plot.

Gus
  • 4,375
  • 5
  • 31
  • 50
user1175720
  • 175
  • 1
  • 1
  • 6
  • 5
    Why you don't use np.histogram()? – Pablo Jun 27 '13 at 21:00
  • Thanks for the suggesting. It looks like the problem lies else where. If I run the above code line by line (not as a function) it works with both np.histogram() and plt.hist(). Any ideas on why using this in a function does not work? – user1175720 Jun 28 '13 at 18:23

4 Answers4

6

You can use np.histogram2d (for 2D histogram) or np.histogram (for 1D histogram):

hst = np.histogram(A, bins)
hst2d = np.histogram2d(X,Y,bins)

Output form will be the same as plt.hist and plt.hist2d, the only difference is there is no plot.

Pjer
  • 61
  • 1
  • 1
5

I don't know if I'm understanding your question very well, but here, you have an example of a very simple home-made histogram (in 1D or 2D), each one inside a function, and properly called:

import numpy as np
import matplotlib.pyplot as plt

def func2d(x, y, nbins):
    histo, xedges, yedges = np.histogram2d(x,y,nbins)
    plt.plot(x,y,'wo',alpha=0.3)
    plt.imshow(histo.T, 
               extent=[xedges.min(),xedges.max(),yedges.min(),yedges.max()],
               origin='lower', 
               interpolation='nearest', 
               cmap=plt.cm.hot)
    plt.show()

def func1d(x, nbins):
    histo, bin_edges = np.histogram(x,nbins)
    bin_center = 0.5*(bin_edges[1:] + bin_edges[:-1])
    plt.step(bin_center,histo,where='mid')
    plt.show()

x = np.random.normal(1.5,1.0, (1000,1000))

func1d(x[0],40)
func2d(x[0],x[1],40)

Of course, you may check if the centering of the data is right, but I think that the example shows some useful things about this topic.

My recommendation: Try to avoid any loop in your code! They kill the performance. If you look, In my example there aren't loops. The best practice in numerical problems with python is avoiding loops! Numpy has a lot of C-implemented functions that do all the hard looping work.

Pablo
  • 2,443
  • 1
  • 20
  • 32
3

No.

But you can bypass the pyplot:

import matplotlib.pyplot

fig = matplotlib.figure.Figure()
ax = matplotlib.axes.Axes(fig, (0,0,0,0))
numeric_results = ax.hist(data)
del ax, fig

It won't impact active axes and figures, so it would be ok to use it even in the middle of plotting something else.

This is because any usage of plt.draw_something() will put the plot in current axis - which is a global variable.

Śmigło
  • 937
  • 8
  • 14
2

If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the np.histogram() function is available

Izaskun
  • 173
  • 1
  • 8