1

I watched many videos, read the Seaborn documentation, checked many websites, but I still haven't found the answer to a question.

This is from the Seaborn documentation:

iris = sns.load_dataset("iris")
ax = sns.boxplot(data=iris, orient="h", palette="Set2")

This code creates boxplots for each numerical variable in a single graph.

Boxplots for Iris dataset

When I tried to add the hue= "species", ValueError: Cannot use hue without x and y. Is there a way to do this with Seaborn? I want to see Boxplots of all the numerical variables and explore a categorical variable. So the graph will show all numerical variables for each species. Since there are 3 species, the total of Boxplots will be 12 (3 species times 4 numerical variables).

I am learning about EDA (exploratory data analysis). I think the above graph will help me explore many variables at once.

Thank you for taking the time to read my question!

JohanC
  • 71,591
  • 8
  • 33
  • 66

1 Answers1

4

To apply "hue", seaborn needs the dataframe in "long" form. df.melt() is a pandas function that can help here. It converts the numeric columns into 2 new columns: one called "variable" with the old name of the column, and one called "value" with the values. The resulting dataframe will be 4 times as long so that "value" can be used for x=, and "variable" for y=.

The long form looks like:

species variable value
0 setosa sepal_length 5.1
1 setosa sepal_length 4.9
2 setosa sepal_length 4.7
3 setosa sepal_length 4.6
4 setosa sepal_length 5.0
... ... ...
import seaborn as sns
from matplotlib import pyplot as plt

iris = sns.load_dataset("iris")
iris_long = iris.melt(id_vars=['species'])
ax = sns.boxplot(data=iris_long, x="value", y="variable", orient="h", palette="Set2", hue="species")
plt.tight_layout()
plt.show()

boxplot with hue

JohanC
  • 71,591
  • 8
  • 33
  • 66