3

I am trying to plot 2 histograms side by side, the first one for the full dataset, and second one for a subset of the dataset. For comparability, I want both to have the same class intervals and the bin widths must be calculated as per the Freedman-Diaconis rule, (probably the default mode used by sns.histplot as per a stackoverflow answer).

I want the first histogram's bins to be the defaults decided by the sns.histplot() function.
Then I want to extract the list of bin intervals or break points used by the first plot, and use that as an argument while generating the second histogram.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.datasets import load_boston
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)

# Histograms
f, axs = plt.subplots(1, 2, figsize=(15, 4.5))
a = sns.histplot(df['NOX'], ax=axs[0], color='steelblue')
b = sns.histplot(df.NOX[df.CRIM > 10.73], ax=axs[1], color='darkgreen')
plt.show()

Questions:
1) How to extract the list of bins used by a sns.histplot()
2) How to plot 2 histograms with same bins, using the Freedman-Diaconis rule?

rahul-ahuja
  • 1,166
  • 1
  • 12
  • 24
  • 1
    I don't think you can extract the bins from a [seaborn histplot](https://stackoverflow.com/a/65393950/8881141) easily but I might be wrong. – Mr. T Mar 02 '21 at 20:36
  • Thanks @JohanC . Your suggestion worked. If you would post it as an answer, I would upvote it. – rahul-ahuja Mar 02 '21 at 20:51
  • @JohanC, I'm also looking for a way to annotate the bars of both histograms with values and or %. Any suggestions? – rahul-ahuja Mar 02 '21 at 20:52

1 Answers1

9

Seaborn usually doesn't give access to its calculations, it just tries to create visualizations. But you can use the same underlying functions to get its results. You need bins = np.histogram_bin_edges(..., bins='auto') (or bins='fd' to force the Freedman Diaconis Estimator). And then sns.histplot(..., bins=bins) for both.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston

boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)

bins = np.histogram_bin_edges(df['NOX'], bins='auto')
f, axs = plt.subplots(1, 2, figsize=(15, 4.5))
sns.histplot(df['NOX'], bins=bins, color='steelblue', ax=axs[0])
sns.histplot(df[df['CRIM'] > 10.73]['NOX'], bins=bins, color='darkgreen', ax=axs[1])
for ax in axs:
    for p in ax.patches:
        x, w, h = p.get_x(), p.get_width(), p.get_height()
        if h > 0:
            ax.text(x + w / 2, h, f'{h / len(df) * 100:.2f}%\n', ha='center', va='center', size=8)
    ax.margins(y=0.07)
plt.show()

example plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • I am trying to label both histograms in a loop. I have figured out how to label them individually with `for p in fig1.patches: ... `, but not how to do this for both figures in a loop. – rahul-ahuja Mar 02 '21 at 21:17