3

What is the correct way to create a violin plot that has one violin split by hue?

I've tried different approaches and it seems that the only way is to create a feature that shares the same value for every entry in the dataset. And pass that feature's name as x.

fig = plt.figure(figsize=(20, 8))

fig.add_subplot(1, 3, 1)
ax = sns.violinplot(x='feature', y='height',
              data=train_cleansed_height,
              scale='count',
              hue='feature', split=True,
              palette='seismic',
              inner='quartile')

fig.add_subplot(1, 3, 2)
ax = sns.violinplot(x='workaround', y='height',
              data=train_cleansed_height,
              scale='count',
              hue='feature', split=True,
              palette='seismic',
              inner='quartile')

fig.add_subplot(1, 3, 3)
ax = sns.violinplot(x=None, y='height',
              data=train_cleansed_height,
              scale='count',
              hue='feature', split=True,
              palette='seismic',
              inner='quartile')
plt.xlabel('x=None')

Violin plot example

But is it the correct way?

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
Ilya Chernov
  • 411
  • 7
  • 18
  • 2
    Is the middle plot ("workaround") the desired result? If so, I think you are correct and you have to trick seaborn with a single x-value. As far as I understand, `hue=` is supposed to be used *in addition* to `x=` and not instead. – Diziet Asahi Feb 15 '18 at 14:11
  • It _seems_ so to me too. It's just I'd like to make sure, because passing the Series object with the same value across all indices feels strange. Thanks anyway. – Ilya Chernov Feb 15 '18 at 14:56

1 Answers1

6

The x argument of seaborn.violinplot needs to be the data for the position. If a single position is desired, the data for x needs to consist of a unique value. If the same data is chosen for the x and the hue, x will be given two different unique values, hence two positions are chosen, as seen in the first plot.

Instead use a repeated label like

sns.violinplot(x=["some label"]*len(df),  ...) 

to create a violin plot at a single position.

import numpy as np;np.random.seed(1)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

a = np.concatenate((np.random.binomial(3,0.3,50)*2.2+1, np.random.rayleigh(3,50)))
df = pd.DataFrame({"height" : a, "feature" : ["A"]*50+["B"]*50})

fig = plt.figure()

fig.add_subplot(1, 2, 1)
ax = sns.violinplot(x='feature', y='height',
              data=df,
              scale='count',
              hue='feature', split=True,
              palette='seismic',
              inner='quartile')

fig.add_subplot(1, 2, 2)
ax = sns.violinplot(x=["AB"]*len(df), y='height',
              data=df,
              scale='count',
              hue='feature', split=True,
              palette='seismic',
              inner='quartile')

plt.tight_layout()
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712