1

The following plot and its code were generated in R (source). How can I replicate this quality of a histogram in Python code using scipy.stats?

enter image description here

x = rgamma(1000, 3, .1)
hist(x, prob=T, br=30, col="skyblue2", main="n = 1000: GAMMA(3, .1)")
curve(dgamma(x, 3, .1), add=T, lwd=2, col="orange")
abline(v = 55.81, lwd=2, col="blue")
abline(v = 53.2232, lwd=2, col="brown", lty="dotted")

The R plot above is alot better than Python's scipy.stats histograms, one example shown below, but I know there are alternative plot libraries for python

enter image description here

from scipy.stats import dgamma
r = dgamma.rvs(1.1, size=1000)
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
develarist
  • 1,224
  • 1
  • 13
  • 34
  • https://seaborn.pydata.org/generated/seaborn.histplot.html – Chris Jan 07 '21 at 03:46
  • I think `seaborn.histplot` only lets you superimpose the KDE curve (kernel density estimator), which is inaccurate compared to the actual parameter-fitted continuous distributions in `scipy.stats`. even the R code is not using KDE, it actually lays down `curve` according to the continuous distribution `dgamma` – develarist Jan 07 '21 at 03:52
  • @develarist are you able to include the source data that you use as a basis to generate the plot, ideally in the format of a `pandas` dataframe? Once you create a dataframe, you can paste the results here: with `df.to_dict()` assuming a dataframe called `df`. – David Erickson Jan 07 '21 at 04:03
  • the source of the R code created data as `x = rgamma(1000, 3, .1)`, while the data in the Python example is `r = dgamma.rvs(a, size=1000)` – develarist Jan 07 '21 at 04:12
  • @develarist I get `NameError: name 'a' is not defined` – David Erickson Jan 07 '21 at 04:28
  • see edit, `a=1.1`. the link for that example is linked in the question since showing all lines of code for the histogram that is not wanted doesn't help with finding an answer https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.dgamma.html#scipy.stats.dgamma – develarist Jan 07 '21 at 04:30

1 Answers1

2

You could use a seaborn histplot + kdeplot if you want the kde to be a different color. Regarding your comment and having the kde as a different color, I commented on this github here where someone had a similar question (I believe this is best way to do this in 2021). So, we are able to get very close to what you have posted with R with a little bit more code. There are many other parameters that you can pass directly to sns.histplot and sns.kdeplot or if the parameter doesn't exist you can add stuff with plt e.g. plt.title('Seaborn Histplot Example') or add stuff to axes with ax..

from scipy.stats import dgamma
import matplotlib.pyplot as plt
import seaborn as sns
r = dgamma.rvs(1.1, size=1000)
sns.set_style("white")
sns.set_context("talk")
fig, ax = plt.subplots(figsize=(24,12))
sns.histplot(r, color='deepskyblue', stat='density')
sns.kdeplot(r, color='orange')
plt.title('Seaborn Histplot Example', size=24, fontweight='bold')
sns.histplot(r, color='deepskyblue', stat='density', edgecolor="black")
sns.kdeplot(r, color='orange')
plt.axvline(2.8, 0, 0.95, color='blue')
plt.axvline(2.4, 0, 0.95, color='brown', linestyle='--')
ax.tick_params(left=True, bottom=True)
plt.show()

enter image description here

David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • 2
    `seaborn.distplot` will be deprecated it says so I'll try `seaborn.displot(kind='hist')`. u could too if you want – develarist Jan 07 '21 at 08:07
  • after trying your answer with `displot` instead of `distplot`, the color settings don't work since `hist_kws` and `kde_kws` are unrecognized. do you know how to set the black edge color and the blue bars in `displot`? – develarist Jan 07 '21 at 21:20
  • @develarist good timing. I just finished looking into this. You are right. We should be doing this the 2021 way. See updated answer. – David Erickson Jan 07 '21 at 21:26
  • thanks for the edit. I don't think R's `curve` function is quite the same as `sns.kdeplot(r, color='orange')` however. At least the documentation contains nothing about KDE. not saying you should altogether remove `kdeplot`, but it would be worth plotting it together with whatever *is* the python equivalent of R's `curve` function to see how `curve` and `kdeplot` compare. https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/curve – develarist Jan 07 '21 at 21:29
  • 1
    @develarist `kdeplot` has a lot of options to adjust the curve with the parameters for `bw`, `bw_adjust`, `cut`, `clip`. Please see the "Notes" section here: https://seaborn.pydata.org/generated/seaborn.kdeplot.html To be honest, at this level of detail with the curve, this is not my area of expertise and I am not a `R` user, so unfortunately, I wouldn't be able to compare. – David Erickson Jan 07 '21 at 21:53
  • @develarist however, this question is a really good one on this topic. The accepted answer explains it well: https://stackoverflow.com/questions/60596102/seaborn-selected-kde-bandwidth-is-0-cannot-estimate-density – David Erickson Jan 07 '21 at 21:57