I want to create graphs like those seen here, but in Python/using matplotlib or seaborn. I was not able to find any way to get both horizontal and vertical whiskers on one plot, and could not find a way to resize the box component itself along the X axis corresponding to different data.
1 Answers
Using matplotlib's boxplot command
Matplotlib's boxplot
has options to set the position and the width of the boxplot. The boxplot's rectangle goes between the first and the third quantile; their difference serves as the width and their mean as the position. (This is different from the median and the mean of the dataset).
The following code first draws the horizontal boxplots with adjusted widths. The vertical boxplots are drawn on top, but with the "box". Changing the boxplot's parameters allows for more adjustments and fine-tuning.
import matplotlib.pyplot as plt
import seaborn as sns # only used to download the iris data set
iris = sns.load_dataset('iris')
categories = iris['species'].unique()
colors = plt.cm.Set3.colors
x_column, y_column = 'sepal_length', 'sepal_width'
for cat, color in zip(categories, colors):
# x-axis: sepal length
x_data = iris[iris['species'] == cat][x_column]
x_q1, x_q3 = np.percentile(x_data, [25, 75])
# y-axis: sepal width
y_data = iris[iris['species'] == cat][y_column]
y_q1, y_q3 = np.percentile(y_data, [25, 75])
plt.boxplot(x_data, positions=[(y_q3 + y_q1) / 2], widths=y_q3 - y_q1, vert=False,
patch_artist=True, manage_ticks=False,
boxprops={'facecolor': color, 'label': cat})
plt.boxplot(y_data, positions=[(x_q3 + x_q1) / 2], widths=x_q3 - x_q1,
showbox=False, manage_ticks=False,
boxprops={'facecolor': color, 'label': cat})
plt.xlabel(x_column)
plt.ylabel(y_column)
plt.legend() # uses the labels assigned to the boxes
plt.tight_layout()
plt.show()
Positioning the central line via the median of other axis
As the boxplots are always drawn symmetric, using a central line that isn't the midpoint between q1 and q3 of the other direction needs the central box to be drawn apart from the boxplot itself. The distance between q1 and q3 still serves to scale the whiskers (which stay symmetric).
import matplotlib.pyplot as plt
import seaborn as sns
iris = sns.load_dataset('iris')
categories = iris['species'].unique()
colors = plt.cm.Set3.colors
x_column, y_column = 'sepal_length', 'sepal_width'
for cat, color in zip(categories, colors):
# x-axis: sepal length
x_data = iris[iris['species'] == cat][x_column]
x_q1, x_median, x_q3 = np.percentile(x_data, [25, 50, 75])
# y-axis: sepal width
y_data = iris[iris['species'] == cat][y_column]
y_q1, y_median, y_q3 = np.percentile(y_data, [25, 50, 75])
plt.boxplot(x_data, positions=[y_median], widths=y_q3 - y_q1, vert=False,
showbox=False, manage_ticks=False)
plt.boxplot(y_data, positions=[x_median], widths=x_q3 - x_q1,
showbox=False, manage_ticks=False,
boxprops={'facecolor': color, 'label': cat})
plt.gca().add_patch(plt.Rectangle((x_q1, y_q1), x_q3 - x_q1, y_q3 - y_q1,
facecolor=color, edgecolor='black', label=cat))
plt.xlabel(x_column)
plt.ylabel(y_column)
plt.legend()
plt.tight_layout()
plt.show()

- 71,591
- 8
- 33
- 66