4

I am trying to create violinplots that shows confidence intervals for the mean. I thought an easy way to do this would be to plot a pointplot on top of the violinplot, but this is not working since they seem to be using different indices for the xaxis as in this example:

import matplotlib.pyplot as plt
import seaborn as sns   

titanic = sns.load_dataset("titanic")
titanic.dropna(inplace=True)
fig, (ax1,ax2,ax3) = plt.subplots(1,3, sharey=True, figsize=(12,4))
#ax1
sns.pointplot("who", "age", data=titanic, join=False,n_boot=10, ax=ax1)
#ax2
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax2)
#ax3
sns.pointplot("who", "age", data=titanic, join=False, n_boot=10, ax=ax3)
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax3)
ax3.set_xlim([-0.5,4])

enter image description here

print(ax1.get_xticks(), ax2.get_xticks())

gives: [0 1 2] [1 2 3]

Why are these plots not assigning the same xtick numbers to the 'who'-variable and is there any way I can change this?

I also wonder if there is anyway I can change the marker for pointplot, because as you can see in the figure, the point is so big so that it covers the entire confidence interval. I would like just a horizontal line if possible.

joelostblom
  • 43,590
  • 17
  • 150
  • 159

2 Answers2

7

I'm posting my final solution here. The reason I wanted to do this kind of plot to begin with, was to display information about the distribution shape, shift in means, and outliers in the same figure. With mwaskom's pointers and some other tweaks I finally got what I was looking for. enter image description here The left hand figure is there as a comparison with all data points plotted as lines and the right hand one is my final figure. The thick grey line in the middle of the violin is the bootstrapped 99% confidence interval of the mean, which is the white horizontal line, both from pointplot. The three dotted lines are the standard 25th, 50th and 75th percentile and the lines outside that are the caps of the whiskers of a boxplot I plotted on top of the violin plot. Individual data points are plotted as lines beyond this points since my data usually has a few extreme ones that I need to remove manually like the two points in the violin below.

enter image description here

For now, I am going to to continue making histograms and boxplots in addition to these enhanced violins, but I hope to find that all the information is accurately captured in the violinplot and that I can start and rely on it as my main initial data exploration plot. Here is the final code to produce the plots in case someone else finds them useful (or finds something that can be improved). Lots of tweaking to the boxplot.

import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns  

#change the linewidth which to get a thicker confidence interval line
mpl.rc("lines", linewidth=3)
df = sns.load_dataset("titanic")
df.dropna(inplace=True)
x = 'who'
y = 'age'
fig, (ax1,ax2) = plt.subplots(1,2, sharey=True, figsize=(12,6))
#Left hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax1, inner='stick')
#Right hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax2, positions=0)
sns.pointplot(df[x],df[y], join=False, ci=99, n_boot=1000, ax=ax2, color=[0.3,0.3,0.3], markers=' ')
df.boxplot(y, by=x, sym='_', ax=ax2, showbox=False, showmeans=True, whiskerprops={'linewidth':0},
    medianprops={'linewidth':0}, flierprops={'markeredgecolor':'k', 'markeredgewidth':1},
    meanprops={'marker':'_', 'color':'w', 'markersize':6, 'markeredgewidth':1.5},
    capprops={'linewidth':1, 'color':[0.3,0.3,0.3]}, positions=[0,1,2])
#One could argue that this is not beautiful
labels = [item.get_text() + '\nn=' + str(df.groupby(x).size().loc[item.get_text()]) for item in ax2.get_xticklabels()]
ax2.set_xticklabels(labels)
#Clean up
fig.suptitle('')
ax2.set_title('')
fig.set_facecolor('w')

Edit: Added 'n='

joelostblom
  • 43,590
  • 17
  • 150
  • 159
  • I like your question - violin plots are useful but could be improved with confidence levels and modified ways to show outliers. Have you improved your code since this post? I'd be very interested to see it if you have. – thescoop May 02 '16 at 11:26
  • Thanks and sorry for the late response! I haven't really taken this much further apart from adding two different outliers lines, one for a quantile cutoff in addition to the interquartile range I was using in the plots above. I have also started to favor plotting the data points directly instead of the violins, at least when my sample sizes aren't huge. – joelostblom May 27 '16 at 03:14
  • That's a good idea, opting for violins only when data is sparse over all sample sets. Good luck! – thescoop May 28 '16 at 12:47
3

violinplot takes a positions argument that you can use to put the violins somewhere else (they currently just inherit the default matplotlib boxplot positions).

pointplot takes a markers argument that you can use to change how the point estimate is rendered.

mwaskom
  • 46,693
  • 16
  • 125
  • 127
  • great thanks. I swear I tried the markers argument, but couldn't get it to work even though I did it exactly the same way as I got it working now... oh well. I initially didn't want to move both the violin and boxplot, but cause I thought it would be more complicated, but it was pretty was after all. I posted my final figures as well. It would be cool if the default violin plot in seaborn could show outliers and maybe whisker caps (maybe with an 'outer' argument), but I know you said that code is in need of refactoring so i'll stick to my hacky plot-on-plot solution for a while. – joelostblom Dec 06 '14 at 05:29