Are there functions to retrieve the histogram counts of a Series in pandas?

Question

There is a method to plot Series histograms, but is there a function to retrieve the histogram counts to do further calculations on top of it?

I keep using numpy's functions to do this and converting the result to a DataFrame or Series when I need this. It would be nice to stay with pandas objects the whole time.

Andy Hayden · Accepted Answer · 2013-06-17T13:57:00.673

18

If your Series was discrete you could use value_counts:

In [11]: s = pd.Series([1, 1, 2, 1, 2, 2, 3])

In [12]: s.value_counts()
Out[12]:
2    3
1    3
3    1
dtype: int64

You can see that s.hist() is essentially equivalent to s.value_counts().plot().

If it was of floats an awful hacky solution could be to use groupby:

s.groupby(lambda i: np.floor(2*s[i]) / 2).count()

edited Jun 17 '13 at 13:57

answered Jun 17 '13 at 13:38

Andy Hayden

359,921
101
625
535

I have floating point numbers. :( Most of those counts will be 1. This may still useful for cumulative distributions though, thanks. Can I resample somehow like I can do with TimeSeries? – Rafael S. Calsaverini Jun 17 '13 at 13:41
@RafaelS.Calsaverini ah, I see! – Andy Hayden Jun 17 '13 at 13:43
@RafaelS.Calsaverini well, I have a hacky way, it seems likely there is a better way (pandas-foo isn't with me today). – Andy Hayden Jun 17 '13 at 13:54
1

Note that `value_counts` now has a `bins` argument to handle floats nicely. – IanS Feb 15 '19 at 08:56

score 14 · Answer 2 · answered Jun 17 '13 at 15:02

Since hist and value_counts don't use the Series' index, you may as well treat the Series like an ordinary array and use np.histogram directly. Then build a Series from the result.

In [4]: s = Series(randn(100))

In [5]: counts, bins = np.histogram(s)

In [6]: Series(counts, index=bins[:-1])
Out[6]: 
-2.968575     1
-2.355032     4
-1.741488     5
-1.127944    26
-0.514401    23
 0.099143    23
 0.712686    12
 1.326230     5
 1.939773     0
 2.553317     1
dtype: int32

This is a really convenient way to organize the result of a histogram for subsequent computation.

To index by the center of each bin instead of the left edge, you could use bins[:-1] + np.diff(bins)/2.

This is close to what I usually do. I was just curious if there was a built in pandas function for that. — Rafael S. Calsaverini, Jun 18 '13 at 12:05

IanS · Answer 3 · 2016-12-23T08:50:35.147

7

If you know the number of bins you want, you can use pandas' cut function, which is now accessible via value_counts. Using the same random example:

s = pd.Series(np.random.randn(100))
s.value_counts(bins=5)

Out[55]: 
(-0.512, 0.311]     40
(0.311, 1.133]      25
(-1.335, -0.512]    14
(1.133, 1.956]      13
(-2.161, -1.335]     8

edited Dec 23 '16 at 08:50

answered Dec 19 '16 at 11:12

IanS

15,771
9
60
84

score 0 · Answer 4 · answered Nov 16 '21 at 07:52

Based on this answer from a related question you can get the bin edges and histogram counts as follows:

s = pd.Series(np.random.randn(100))
ax = s.hist()

for rect in dd.patches:
    ((x0, y0), (x1, y1)) = rect.get_bbox().get_points()
    print(((x0, y0), (x1, y1)))

Are there functions to retrieve the histogram counts of a Series in pandas?

4 Answers4

Linked