1

I have a pandas series x:

0      -0.000069
1      -0.000059
2      -0.000025
3      -0.000021
4      -0.000021
          ...   
1036    0.000032
1037    0.000033
1038    0.000052
1039    0.000055
1040    0.000092
Name: c, Length: 1041, dtype: float64

I would like to plot a probability density function with histogram, in which I used seaborn.distplot:

import matplotlib.pyplot as plt
import seaborn as sns

sns.distplot(x, hist=True, kde=True, bins=100,
             hist_kws={'edgecolor':'black', 'color': 'r'},
             kde_kws={'linewidth': 1, 'color': 'b'})

plt.xlim(-0.00002, 0.00002)
plt.ylim(ymin=0)
plt.xlabel("x")
plt.ylabel("probability")
plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))

plt.show()

As a result, I get the following figure:

enter image description here

As shown, the vertical axis represents count, but instead I want (and expected from this code) probability. I am quite confused, as the identical code works properly for another pandas series. For example, with the identical code with different series (and different labels, etc.), I was able to produce the following correct figure:

enter image description here

Any idea why this code isn't working for my first series, and/or possible solutions?

jstaxlin
  • 517
  • 4
  • 18

2 Answers2

1

The "problem", so to speak, is the fact that you labeled your y-axis "probability" when it is not a probability. The probability is the area under the curve (which is equal to 1).

In your first plot, you have very large density, but very small x-values, so the product of the two remain coherent with a probability. See probability density function for more info.

Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
0

I would edit out your plt.ylabel("probability") and label it to something else (the correct indicator, that is) or not label it at all.

I recommend using plt.ylabel("probability density").