0

The inner boxplots that I get (through specification of inner='box') when generating seaborn violinplots are not accurate for my actual data. See example plot below. The actual data extend to the tip of the thin tails. But the boxplots end well within the area of the violin.

Assuming these boxplots are supposed to be representing the quartiles, and not standard deviations or something, then they are inaccurate.

My code invoking seaborn violinplot is below. As you can see, I have set the option cut=0, which should mean that the tails of the violin plot do not extent beyond my extreme data at all, and in fact, from inspection I can see that the extents of the violin are in the correct places. But I can also see from inspection that the inner boxplots are not even close to right.

sns.violinplot(x='Policy', y='LMP', order=cat_order, data=df, inner='box', scale='area', bw=0.2, cut=0, linewidth=0.5, ax = axes)

Does anyone have any insight into what seaborn does here? Are they deciding (only for purposes of the boxplot) that some of my data are outliers, and excluding them? Any ideas for how to control that?

enter image description here

Emily Beth
  • 709
  • 1
  • 7
  • 23

1 Answers1

1

OK, I tracked down the answer to my own question. While I'm used to boxplots based on strict quartiles, Seaborn uses another (apparently common) approach where the tips of the boxes on their boxplots extend to only 1.5 times the "interquartile range" or IQR.

See here for information Seaborn boxplots: http://seaborn.pydata.org/tutorial/categorical.html#distributions-of-observations-within-categories

See here for definition of IQR: http://stattrek.com/statistics/dictionary.aspx?definition=Interquartile%20range

Emily Beth
  • 709
  • 1
  • 7
  • 23