1

I want to plot a box plot with my DataFrame:

     A     B     C
max  10    11    14
min  3     4     10
q1   5     6     12
q3   9     7     13

how can I plot a box plot with these fixed values?

MasterBeast64
  • 11
  • 1
  • 3

1 Answers1

0

You can use the Axes.bxp method in matplotlib, based on this helpful answer. The input is a list of dictionaries containing the relevant values, but the median is a required key in these dictionaries. Since the data you provided does not include medians, I have made up medians in the code below (but you will need to calculate them from your actual data).

import matplotlib.pyplot as plt
import pandas as pd

# reproducing your data
df = pd.DataFrame({'A':[10,3,5,9],'B':[11,4,6,7],'C':[14,10,12,13]})

# add a row for median, you need median values! 
sample_medians = {'A':7, 'B':6.5, 'C':12.5}
df = df.append(sample_medians, ignore_index=True)
df.index = ['max','min','q1','q3','med']

Here is the modified df with medians included:

>>> df
        A     B     C
max  10.0  11.0  14.0
min   3.0   4.0  10.0
q1    5.0   6.0  12.0
q3    9.0   7.0  13.0
med   7.0   6.5  12.5

Now we transform the df into a list of dictionaries:

labels = list(df.columns)

# create dictionaries for each column as items of a list
bxp_stats = df.apply(lambda x: {'med':x.med, 'q1':x.q1, 'q3':x.q3, 'whislo':x['min'], 'whishi':x['max']}, axis=0).tolist()

# add the column names as labels to each dictionary entry
for index, item in enumerate(bxp_stats):
    item.update({'label':labels[index]})

_, ax = plt.subplots()
ax.bxp(bxp_stats, showfliers=False);
plt.show()

enter image description here

Unfortunately the median line is a required parameter so it must be specified for every box. Therefore we just make it as thin as possible to be virtually unseeable.

If you want each box to be drawn with different specifications, they will have to be in different subplots. I understand if this looks kind of ugly, so you can play around with the spacing between subplots or consider removing some of the y-axes.

fig, axes = plt.subplots(nrows=1, ncols=3, sharey=True)

# specify list of background colors, median line colors same as background with as thin of a width as possible
colors = ['LightCoral', '#FEF1B5', '#EEAEEE']
medianprops = [dict(linewidth = 0.1, color='LightCoral'), dict(linewidth = 0.1, color='#FEF1B5'), dict(linewidth = 0.1, color='#EEAEEE')]
# create a list of boxplots of length 3
bplots = [axes[i].bxp([bxp_stats[i]], medianprops=medianprops[i], patch_artist=True, showfliers=False) for i in range(len(df.columns))]
# set each boxplot a different color
for i, bplot in enumerate(bplots):
    for patch in bplot['boxes']:
        patch.set_facecolor(colors[i])
plt.show()

enter image description here

Derek O
  • 16,770
  • 4
  • 24
  • 43
  • Hi Thank you for your code. But what if I really do not want to plot the median? because I do not have the median value. Thanks. – MasterBeast64 Jun 03 '20 at 04:25
  • One more question, what if I want to add different color on the box? like the the box of A is RED, B is YELLOW, C is PURPLE. I want to control the color base on the name (A,B and C) is it possible? – MasterBeast64 Jun 03 '20 at 05:07
  • Since you have to specify the median, you can make it the same color as the box background and as thin as possible (hacky!). Since all of the boxplots are in the same figure, any parameters you pass to them will apply to every boxplot, so you will need to make subplots. I updated my answer to reflect these changes- I hope it helps! – Derek O Jun 03 '20 at 07:23