4

I have a dataframe with 3 columns: one of them is a "groupby" column, the other two are "normal" columns with values. I want to generate a boxplot and a bar chart as well. On the bar chart I want to visualize the number of occurences of each group's element. Let my sample code tell this dataframe in more detailed:

li_str = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']

df = pd.DataFrame([[i]+j[k] for i,j in {li_str[i]:np.random.randn(j, 2).tolist() for i,j in \
    enumerate(np.random.randint(5, 15, len(li_str)))}.items() for k in range(len(j))]
    , columns=['A', 'B', 'C'])

So above I generate random number of random values to every element in li_str and I do it for columns Band C.

Then I visualize only a boxplot:

fig, ax = plt.subplots(figsize=(16,6))
p1 = df.boxplot(ax=ax, column='B', by='A', sym='')

My result is: enter image description here

Now I visualize the number of elements every group has (so the random numbers I generated above with np.random.randint(5, 15, len(li_str)) code):

fig, ax = plt.subplots(figsize=(16,6))

df_gb = df.groupby('A').count()

p2 = df_gb['B'].plot(ax=ax, kind='bar', figsize=(16,6), colormap='Set2', alpha=0.3)
plt.ylim([0, 20])

My result is: enter image description here

And now I want these two in one diagram:

fig, ax = plt.subplots(figsize=(16,6))
ax2 = ax.twinx()

df_gb = df.groupby('A').count()

p1 = df.boxplot(ax=ax, column='B', by='A', sym='')
p2 = df_gb['B'].plot(ax=ax2, kind='bar', figsize=(16,6)
    , colormap='Set2', alpha=0.3, secondary_y=True)
plt.ylim([0, 20])

My result is: enter image description here

Does anybody know why my boxplot is shifted to right with one x-axis tick? I use Python 3.5.1, pandas 0.17.0, matplotlib 1.4.3

Thank you!!!

ragesz
  • 9,009
  • 20
  • 71
  • 88

1 Answers1

3

It's because the boxplot and the bar plot do not use the same xticks even if the labels are the same.

df.boxplot(column='B', by='A')
plt.xticks()

(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]), <a list of 10 Text xticklabel objects>)

df.groupby('A').count()['B'].plot(kind='bar')
plt.xticks()

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

At a glance it looks to me like an inconsistency which should be fixed in matplotlib boxplot(), but I might just be overlooking the rationale.

As a workaround use matplotlib bar(), that allows you to specify the xticks to match those of the boxplot (I did not found a way to do it with df.plot(kind='bar').

df.boxplot(column='B', by='A')
plt.twinx()
plt.bar(left=plt.xticks()[0], height=df.groupby('A').count()['B'],
        align='center', alpha=0.3)

enter image description here

Stop harming Monica
  • 12,141
  • 1
  • 36
  • 56
  • Thank you for this working solution. My problem is that now I want to plot the mean as well (it is a simple line), but neither matplotlib `plt.plot()` nor pandas `df.plot()` works for me. Better to say I'm not able to specify the xticks for these functions :( – ragesz Feb 26 '16 at 14:26
  • Can confirm that this works, spent quite a bit of time scratching my head over this one. –  Jul 07 '18 at 04:03