0

I want to create graphs like those seen here, but in Python/using matplotlib or seaborn. I was not able to find any way to get both horizontal and vertical whiskers on one plot, and could not find a way to resize the box component itself along the X axis corresponding to different data.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Liam Wilson
  • 179
  • 1
  • 9

1 Answers1

0

Using matplotlib's boxplot command

Matplotlib's boxplot has options to set the position and the width of the boxplot. The boxplot's rectangle goes between the first and the third quantile; their difference serves as the width and their mean as the position. (This is different from the median and the mean of the dataset).

The following code first draws the horizontal boxplots with adjusted widths. The vertical boxplots are drawn on top, but with the "box". Changing the boxplot's parameters allows for more adjustments and fine-tuning.

import matplotlib.pyplot as plt
import seaborn as sns  # only used to download the iris data set

iris = sns.load_dataset('iris')
categories = iris['species'].unique()
colors = plt.cm.Set3.colors
x_column, y_column = 'sepal_length', 'sepal_width'
for cat, color in zip(categories, colors):
    # x-axis: sepal length
    x_data = iris[iris['species'] == cat][x_column]
    x_q1, x_q3 = np.percentile(x_data, [25, 75])
    # y-axis: sepal width
    y_data = iris[iris['species'] == cat][y_column]
    y_q1, y_q3 = np.percentile(y_data, [25, 75])
    plt.boxplot(x_data, positions=[(y_q3 + y_q1) / 2], widths=y_q3 - y_q1, vert=False,
                patch_artist=True, manage_ticks=False,
                boxprops={'facecolor': color, 'label': cat})
    plt.boxplot(y_data, positions=[(x_q3 + x_q1) / 2], widths=x_q3 - x_q1,
                showbox=False, manage_ticks=False,
                boxprops={'facecolor': color, 'label': cat})
plt.xlabel(x_column)
plt.ylabel(y_column)
plt.legend()  # uses the labels assigned to the boxes
plt.tight_layout()
plt.show()

matplotlib double boxplot

Positioning the central line via the median of other axis

As the boxplots are always drawn symmetric, using a central line that isn't the midpoint between q1 and q3 of the other direction needs the central box to be drawn apart from the boxplot itself. The distance between q1 and q3 still serves to scale the whiskers (which stay symmetric).

import matplotlib.pyplot as plt
import seaborn as sns

iris = sns.load_dataset('iris')
categories = iris['species'].unique()
colors = plt.cm.Set3.colors
x_column, y_column = 'sepal_length', 'sepal_width'
for cat, color in zip(categories, colors):
    # x-axis: sepal length
    x_data = iris[iris['species'] == cat][x_column]
    x_q1, x_median, x_q3 = np.percentile(x_data, [25, 50, 75])
    # y-axis: sepal width
    y_data = iris[iris['species'] == cat][y_column]
    y_q1, y_median, y_q3 = np.percentile(y_data, [25, 50, 75])
    plt.boxplot(x_data, positions=[y_median], widths=y_q3 - y_q1, vert=False,
                showbox=False, manage_ticks=False)
    plt.boxplot(y_data, positions=[x_median], widths=x_q3 - x_q1,
                showbox=False, manage_ticks=False,
                boxprops={'facecolor': color, 'label': cat})
    plt.gca().add_patch(plt.Rectangle((x_q1, y_q1), x_q3 - x_q1, y_q3 - y_q1,
                                      facecolor=color, edgecolor='black', label=cat))
plt.xlabel(x_column)
plt.ylabel(y_column)
plt.legend()
plt.tight_layout()
plt.show()

boxplot both in horizontal and vertical direction, using median as central line

JohanC
  • 71,591
  • 8
  • 33
  • 66