2

I am trying to draw a mean line on violin plots, since I was not able to find a way to make sns replace the "median" line that comes from "quartiles", I decided to code so that for each case it draws on top. I am planning on drawing horizontal lines using plt.plot on the mean value (y value) of each of the three graphs I have.

I have the exact y (height) values where I want my horizontal line to be drawn, however, I am having difficulty trying to figure out the bound of each violin graph on that specific y value. I know since it is symmetric the domain is (-x, x), so I need a way to find that "x" value for me to be able to have 3 added horizontal lines which each bounded by the violin graphs that I have.

Here is my code, the x value of the plt.plot is -0.37, which is something I found by trial and error, I want python to find that for me for a given y value.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

data = [2.57e-05, 4.17e-06, -5.4e-06, -5.05e-06, 1.15e-05, -6.7e-06, 1.01e-05, 5.53e-06, 8.13e-06, 1.27e-05, 1.11e-06, -2.87e-06, -1.38e-06, -1.07e-05, -8.04e-06, 4.77e-06, 3.22e-07, 9.86e-06, 1.38e-05, 1.32e-05, -3.48e-06, -4.69e-06, 8.15e-06, 4.21e-07, 2.71e-06, 7.52e-08, 1.04e-06, -1.92e-06, -4.08e-06, 4.76e-06]

vg = sns.violinplot(data=data, inner="quartile", scale="width")
    
a = sns.pointplot(data=data, zlinestyles='-', join=False, ci=None, color='red')
        
for p in vg.lines:
    p.set_linestyle('-')
    p.set_linewidth(0.8)  # Sets the thickness of the quartile lines 
    p.set_color('white')  # Sets the color of the quartile lines 
    p.set_alpha(0.8)

for p in vg.lines[1::3]:  # these are the median lines; not means
    p.set_linestyle('-')
    p.set_linewidth(0)  # Sets the thickness of the median lines 
    p.set_color('black')  # Sets the color of the median lines 
    p.set_alpha(0.8)

# add a mean line from the edge of the violin plot
plt.plot([-0.37, 0], [np.mean(data), np.mean(data)], 'k-', lw=1)
plt.show()

enter image description here

Refer to the picture where I removed the median point but left the quartile lines, where I want to draw mean lines across where the blue dots are visible

And here is a picture once I draw that plt.plot with the x value I found via trial and error: For case I only

The image

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Cindy Burker
  • 117
  • 9

1 Answers1

3

You can draw a line that is too long, and then clip it with the polygon forming the violin.

Note that inner='quartile' shows the 25%, 50% and 75% lines. The 50% line is also known as the median. This is similar to how boxplots are usually drawn. It is rather confusing to show the mean in a too similar style. That's why seaborn (and many other libraries) prefer to show the mean as a point.

Here is some example code (note that the return value of sns.violinplot is an ax, and naming it very different makes it rather hard to find your way into matplotlib and seaborn docs and examples).

import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
import seaborn as sns
import pandas as pd
import numpy as np

tips = sns.load_dataset('tips')
tips['day'] = pd.Categorical(tips['day'])

ax = sns.violinplot(data=tips, x='day', y='total_bill', hue='day', inner='quartile', scale='width', dodge=False)
sns.pointplot(data=tips, x='day', y='total_bill', join=False, ci=None, color='yellow', ax=ax)
ax.legend_.remove()

for p in ax.lines:
    p.set_linestyle('-')
    p.set_linewidth(0.8)  # Sets the thickness of the quartile lines
    p.set_color('white')  # Sets the color of the quartile lines
    p.set_alpha(0.8)
for x, (day, violin) in enumerate(zip(tips['day'].cat.categories, ax.collections)):
    line = ax.hlines(tips[tips['day'] == day]['total_bill'].mean(), x - 0.5, x + 0.5, color='black', ls=':', lw=2)
    patch = PathPatch(violin.get_paths()[0], transform=ax.transData)
    line.set_clip_path(patch)  # clip the line by the form of the violin
plt.show()

violinplot with line for the mean

Updated to use a list of lists of data:

data = [np.random.randn(10, 7).cumsum(axis=0).ravel() for _ in range(3)]

ax = sns.violinplot(data=data, inner='quartile', scale='width', palette='Set2')
# sns.pointplot(data=data, join=False, ci=None, color='red', ax=ax) # shows the means
ax.set_xticks(range(len(data)))
ax.set_xticklabels(['I' * (k + 1) for k in range(len(data))])

for p in ax.lines:
    p.set_linestyle('-')
    p.set_linewidth(0.8)  # Sets the thickness of the quartile lines
    p.set_color('white')  # Sets the color of the quartile lines
    p.set_alpha(0.8)
for x, (data_x, violin) in enumerate(zip(data, ax.collections)):
    line = ax.hlines(np.mean(data_x), x - 0.5, x + 0.5, color='black', ls=':', lw=2)
    patch = PathPatch(violin.get_paths()[0], transform=ax.transData)
    line.set_clip_path(patch)
plt.show()

violinplot from lists, with mean line

PS: Some further explanation about enumerate(zip(...))

  • for data_x in data: would loop through the entries of the list data, first assigning data[0] to data_x etc.
  • for x, data_x in enumerate(data): would loop through the entries of the list data and at the same time increment a variable x from 0 to 1 and finally to 2.
  • for data_x, violin in zip(data, ax.collections): would the data_x loop through the entries of the list data and simultaneously a variable violin through the list stored in ax.collections (this is where matplotlib stores the shapes of the violins)
  • for x, (data_x, violin) in enumerate(zip(data, ax.collections)): combines the enumeration with zip`
JohanC
  • 71,591
  • 8
  • 33
  • 66
  • I am confused into how I could implement this method to my code: I am working with array's so I do not really understand what the for loop: for x, (day, violin) in enumerate(zip(tips['day'].cat.categories, ax.collections)): is iterating for. What does x, (day,violin) really stand for? and than what the enumerate(zip(..)) really be doing? Thank you for your help by the way, I have been looking every where for an answer, a little bit more help would just be wonderful! – Cindy Burker Sep 19 '21 at 22:57
  • Also when I adjust a similar formula, my python does not understand what "violin" is on the for look and scream at me. – Cindy Burker Sep 19 '21 at 23:05
  • @TrentonMcKinney Initially it wasn't, then I tried Jupiter and it did – Cindy Burker Sep 19 '21 at 23:38
  • 1
    @TrentonMcKinney Many thanks for the additional code. I updated it to leave out the `[:4]` in the loop, and to be more similar to the OP. – JohanC Sep 20 '21 at 06:41