0

I have the following code using the seaborn library in python that plots a grid of histograms from data from within the seaborn library:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns, numpy as np
from pylab import *

penguins = sns.load_dataset('penguins')

sns.displot(penguins, x='bill_length_mm', col='species', row='island', hue='island',height=3, 
            aspect=2,facet_kws=dict(margin_titles=True, sharex=False, sharey=False),kind='hist', palette='viridis')

plt.show()

This produces the following grid of histograms: enter image description here

And so we have histograms for each species-island combination showing the frequency distribution of different penguin bill lengths, organized in a "grid" of histograms, where the columns of this grid of histograms are organized by species and the rows of this grid are organized by island. And so, I see that seaborn automatically names each column label as the "species" by the argument: col=species. I then see seaborn labels each row as "Count" with the rows organized by island, with different representative "hues" from the argument: hue=island.

What I am trying to do is override these default automatic labels to add my own customization. Specifically what I want to do is replace the top axes labels with just "A", "B", and "C" below a "Species" header, and on the left axis, replace each "Count" instance with the names of each island, but all of these labels in much bigger font size.

This is what I am trying to produce: enter image description here

What I am trying to figure out is, how can I "override" the automatic labelling from the above seaborn arguments so that I can print my custom histogram grid labels, but done in a dynamic way, such that if there were potentially another data set with more islands and more species, the intended labelling organization would still be produced?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

1 Answers1

2

The sns.displot function returns a FacetGrid object. This object let you customize the row and col titles with the methods set_titles() set_axis_labels(). However, with the very custom figure you want to achieve, I'm afraid you'll have to overwrite labels and titles directly through FacetGrid.axes which gives you access to a ndarray of matplotlib.Axis.

g = sns.displot(penguins, x='bill_length_mm', col='species', row='island', hue='island',height=3,
                aspect=2,facet_kws=dict(margin_titles=True, sharex=False, sharey=False), kind='hist', palette='viridis',
                legend=False)  # Do not display the legend

g.set_titles(row_template="")  # Remove the marginal titles on the right side

# Rewrite the top-row axis titles
custom_colnames = ["A", "B", "C"]
for i, ax in enumerate(g.axes[0]):
    ax.set_title(custom_colnames[i], fontsize=14)  # Adjust the fontsize

# Rewrite the first-col axis ylabels
custom_rownames = penguins["species"].unique()
for i, ax in enumerate(g.axes[:, 0]):
    ax.set_ylabel(custom_rownames[i], fontsize=14, rotation=0, ha="right")

# Remove the last-row axis xlabels
for i, ax in enumerate(g.axes[-1]):
    ax.set_xlabel("")

# Add a figure suptitle
plt.gcf().suptitle("Species", y=1.05, fontsize=16)

Very custom, but it should give you the desired figure

enter image description here

thmslmr
  • 1,251
  • 1
  • 5
  • 11
  • Thank you very much, this looks like exactly what I was looking for here. I understand that the "A", "B" and "C" need to be hard-coded here, such that a fourth species in my data would not get its grid column automatically labeled as "D", but for my own understanding of how to use this code, let's say I rewrote my source data to map each species as "A", "B", "C", "D", "E", "F", etc. So from the original code, the grid columns would get headers as "Species = A", "Species = B", "Species = C", "Species = D", etc. How can I remove the "Species =", so that I can just get A, B, C , D as the headers? – LostinSpatialAnalysis Mar 03 '23 at 20:10
  • hmmm I think I did it correctly, under "# Rewrite the first-col axis ylabels" I changed `custom_rownames = penguins["species"].unique()` to `custom_rownames = penguins["island"].unique()` and it appears to have worked. I think this makes sense, but I am still learning. – LostinSpatialAnalysis Mar 03 '23 at 20:21
  • 1
    Yes exactly ! I put `custom_colnames = ["A", "B", "C"]` as a hard-coded example to match your figure, but you can choose any column of interest in the dataframe with the `df[colname].unique()` statement. @LostinSpatialAnalysis Does it answer the question ? – thmslmr Mar 03 '23 at 21:07
  • Sounds good! Thank you! Though I encountered a small issue when trying to modify this code. I have data that has the "Species" column example as numbers 1, 2, 3, 4, 5, etc. instead of A, B, C, D, E, F, etc. But when I try to make the facegrid you created above, the column order goes 1, 0, 2, instead of 0, 1, 2. I don't understand this! Shouldn't python know to order these columns as 0, 1, 2? Your code makes so much sense to me, so I am confused why it is not registering the order, such as it would know letter-named columns should be order alphabetically. Thank you! – LostinSpatialAnalysis Mar 03 '23 at 21:59
  • I think I got it, just run `np.sort(custom_colnames)`. Your code works great! Thank you again! – LostinSpatialAnalysis Mar 03 '23 at 22:10
  • @LostinSpatialAnalysis Nice ! Feel free to accept the answer ;-) – thmslmr Mar 04 '23 at 02:19