1

Following this answer I am trying to plot a seaborn histplot with the pdf I get from scipy.stats. The problem is that the plotted pdf is catastrophically out of scale (example below):

from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

params = (20, -1500, 8000)
sample = stats.invgauss.rvs(size=1000, *params)
fig, ax = plt.subplots()
sns.histplot(sample, log_scale=True, kde=False, stat='density')
x0, x1 = ax.get_xlim()
x_pdf = np.exp(np.linspace(np.log(x0),np.log(x1),500))
fitted_params = stats.invgauss.fit(sample)
y_pdf = stats.invgauss.pdf(x_pdf, *fitted_params)
y_cdf = stats.invgauss.cdf(x_pdf, *fitted_params)
ax.plot(x_pdf, y_pdf, 'r', lw=2)
plt.savefig('Pdf_check.png')

pdf missing

On the other hand, cdf prints fine and is consistent with what I would expect:

# same as above
ax.plot(x_pdf, y_cdf, 'r', lw=2)
plt.savefig('Cdf_check.png')

cdf ok

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
VErYSEYmPhY
  • 83
  • 10

1 Answers1

3

I found a solution. Sorry for self-answering but I think it would be useful to have it written here for the future.

With log_scale=True and stat='density', seaborn uses logarithmic bins and rescales bins height such that the integral is 1. So one needs to use instead of pdf a function g such that the integral over the rescaled bins is the same as for the pdf. This gives g(x) = f(x) * x * log(10) and indeed this works:

params = (20, -1500, 8000)
sample = stats.invgauss.rvs(size=1000, *params)
fig, ax = plt.subplots()
sns.histplot(sample, log_scale=True, kde=False, stat='density')
x0, x1 = ax.get_xlim()
x_pdf = np.exp(np.linspace(np.log(x0),np.log(x1),500))
fitted_params = stats.invgauss.fit(sample)
y_pdf = stats.invgauss.pdf(x_pdf, *fitted_params)
y_cdf = stats.invgauss.cdf(x_pdf, *fitted_params)
# rescaled pdf:
y_pdf_rescaled = y_pdf * x_pdf * np.log(10)
ax.plot(x_pdf, y_pdf_rescaled, 'r', lw=2)
plt.savefig('{}/Pdf_check.png'.format(out_dir))

rescaled pdf

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
VErYSEYmPhY
  • 83
  • 10