16

I am trying to plot a histogram of multiple attributes grouped by another attributes, all of them in a dataframe.

with the help of this question, I am able to set title for the plot. Is there an easy way to switch on legend for each subplot.

Here is my code

import numpy as np
from numpy.random import randn,randint
import pandas as pd
from pandas import DataFrame
import pylab as pl

x=DataFrame(randn(100).reshape(20,5),columns=list('abcde'))
x['new']=pd.Series(randint(0,3,10))
x.hist(by='new')
pl.suptitle('hist by new')

enter image description here

Community
  • 1
  • 1
vumaasha
  • 2,765
  • 4
  • 27
  • 41

1 Answers1

20

You can almost get what you want by doing:

g.plot(kind='bar')

but it produces one plot per group (and doesn't name the plots after the groups so it's a bit useless IMO.)

Here's something which looks rather beautiful, but does involve quite a lot of "manual" matplotlib work, which everyone wants to avoid, but no one can:

import numpy.random as rnd
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm

x = pd.DataFrame(rnd.randn(100).reshape(20, 5), columns=list('abcde'))

group_col = 'groups'
groups = ['foo', 'bar', 'baz']
x[group_col] = pd.Series(rnd.choice(groups, len(x)))

g = x.groupby(group_col)
num_groups = g.ngroups

fig, axes = plt.subplots(num_groups)
for i, (k, group) in enumerate(g):
    ax = axes[i]
    ax.set_title(k)
    group = group[[c for c in group.columns if c != group_col]]
    num_columns = len(group.columns)
    colours = cm.Spectral([float(x) / num_columns for x in range(num_columns)])
    ax.hist(group.values, 5, histtype='bar',
            label=list(group.columns), color=colours,
            linewidth=1, edgecolor='white')
    ax.legend()

plt.show()

Which I think gives you what you want: Beautiful histogram


Update In response to comments (and as this answer is a few years old) I've tried to strip this answer down to its barest bones. There may now be a way of labelling plots of groupby objects but I don't know of it.

Here's the simplest possible way to do this:

axes = g.plot(kind='hist')
for i, (groupname, group) in enumerate(g):
    axes[i].set_title(groupname)
LondonRob
  • 73,083
  • 37
  • 144
  • 201
  • 1
    wonderful. So, there is no way to get this done without getting hands with dirty with real matplotlib api. This should be a pandas limitation, am i right? – vumaasha Jun 18 '15 at 17:30
  • I am looking for histograms? How can i convert the bars in to histograms? – vumaasha Jun 18 '15 at 18:48
  • 2
    I've updated the answer to use histograms (and make the result much prettier.) – LondonRob Jun 19 '15 at 16:57
  • 1
    @LondonRob A. I'm no heavy pandas user but using it to manage a gradebook and am need of the same help of vumaasha. Seems that nowadays, as of pandas 0.20.3, such an automatic feature is still not implemented. Do you know happen to know if that is correct? – saintsfan342000 Oct 05 '17 at 19:42