Pandas Histogram of Filtered Dataframe

Question

This has been driving me mad for the one last hour. I can draw a histogram when I use:

hist(df.GVW, bins=50, range=(0,200))

I use the following when I need to filter the dataframe for a given condition in one of the columns, for example:

df[df.TYPE=='SU4']

So far, everything works. When I try to get a histogram of this filtered data I get a key error: KeyError: 0L. I use the following for the histogram of filtered data:

hist(df[df.TYPE=='SU4'].GVW, bins=50, range=(0,200))

Is there a syntax error somewhere? Thanks for the help!

consider using the series method hist rather than... whichever one (?) you are using. I suspect using values will work i.e. `df[df.TYPE=='SU4'].GVW.values` — Andy Hayden, Mar 18 '14 at 22:24
@AndyHayden Ah, posted it at the same time. You should put answers as an answer :-) — joris, Mar 18 '14 at 22:28
@AndyHayden it did work when I use the values attribute. Inituitively I expected it to work without that though. Well, bad inituition :) — marillion, Mar 18 '14 at 22:29
@joris ha! It was more of a guess rather than an answer. marillion: IMO it's weird/unpythonic that hist cares about this and doesn't just iterate over it. — Andy Hayden, Mar 18 '14 at 22:36

joris · Accepted Answer · 2014-03-18T22:30:51.097

10

Maybe try to use the .values attribute (this returns the data as a numpy array), so:

hist(df[df.TYPE=='SU4'].GVW.values, bins=50, range=(0,200))

I assume the reason this does not work is because the matplotlib hist method tries to access the first 0-index element of the input. But because the Series uses its integer index as label and not location, this gives a key error for a sliced Series (as the first element will not have index 0 anymore)

And indeed, as @AndyHayden says, you can also use the pandas hist method:

df[df.TYPE=='SU4'].GVW.hist(bins=50)

edited Mar 18 '14 at 22:30

answered Mar 18 '14 at 22:24

joris

133,120
36
247
202

finally saw that histogram on my screen :) yes, this works. No clue why it does not work without the `.values` attribute though. It works for the unmasked df. Strange... – marillion Mar 18 '14 at 22:27
Added a possible reason, but not fully sure about that. For that reason and to avoid this kind of problems, it can sometimes be better to or use the pandas plotting method, or use `.values` attribute. – joris Mar 18 '14 at 22:32
Here's a thread on the reason: https://github.com/matplotlib/matplotlib/issues/2775; call it an API incompatibility of matplotlib with pandas – Jeff Mar 18 '14 at 22:47
@Jeff seems like this would be an easy fix (`try: x = x.values`) ? – Andy Hayden Mar 18 '14 at 23:33
it has to be done in the matplotlib side and I have them a multitude of suggestions that are fairly generic, one of which was to use a values attribute, best was simply to check a ndim attribute which would work for all ndarray-like that are not actually sub-classes – Jeff Mar 18 '14 at 23:40
I think I will be using the pandas `hist` method. Just feels more consistent. I still have troubles in mixing up matplotlib and pandas plotting options though. Thanks for all the help and comments! – marillion Mar 19 '14 at 16:20

score 4 · Answer 2 · answered May 21 '15 at 17:26

4

I had a similar issue plotting a dataframe I derived using a query. I found that if after deriving the frame I used the reset_index() function on the derived frame it resolved the issue.

answered May 21 '15 at 17:26

user3685329

81
1
6

That helped me aswell – StationaryTraveller May 17 '16 at 19:41

Pandas Histogram of Filtered Dataframe

2 Answers2