3

I am trying to get only non-negative values on the x-axis on the plot for my KDE. I know I can limit the x-axis values but I do not want that. Is there way to smoothly approximate the KDE such that there are no non-negative value? All my data are non-negative but I do not have a lot of sample points(max 500 and I cannot get more). I have also tried to adjust the bandwidth and its not looking nice.

for i in range(len(B)):
    ax = sns.kdeplot(data[i],shade=True)   
ax.set_xlabel('Maimum detection time')
ax.legend(['N=25,R=20', 'N=30,R=20', 'N=35,R=20'],fontsize=5)
plt.show()

plot figure

JohanC
  • 71,591
  • 8
  • 33
  • 66
Deep
  • 79
  • 1
  • 1
  • 7
  • Does this answer your question? [Change y range to start from 0 with matplotlib](https://stackoverflow.com/questions/22642511/change-y-range-to-start-from-0-with-matplotlib) – Ahmad Anis Oct 09 '20 at 02:06
  • Seaborn's kdeplot has a `clip=` parameter which might be useful. Note that getting more data would only help a bit, because a gaussian kde only supposes smooth distributions, without a cut-off. – JohanC Oct 09 '20 at 04:52

1 Answers1

4

What goes on behind kdeplot is that a kernel density is fitted with many little normal density (see this illustration) and the densities at the very edge of the truncation cutoff spill over.

Using an example data:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy.stats import norm

np.random.seed(999)

data = pd.DataFrame({'a':np.random.exponential(0.3,100),
                     'b':np.random.exponential(0.5,100)})  

If you use clip=, it doesn't stop the evaluation at negative values:

for i in data.columns:
    ax = sns.kdeplot(data[i],shade=True,gridsize=200)

enter image description here

If you add cut=0, it will look odd. As you pointed out, you can truncate it at 0:

enter image description here

There are two solutions proposed in this post on cross-validated. I write a python implementation of the R code provided by @whuber:

def trunc_dens(x):
    kde = sm.nonparametric.KDEUnivariate(x)
    kde.fit()
    h = kde.bw
    w = 1/(1-norm.cdf(0,loc=x,scale=h))
    d = sm.nonparametric.KDEUnivariate(x)
    d = d.fit(bw=h,weights=w / len(x),fft=False)
    d_support = d.support
    d_dens = d.density
    d_dens[d_support<0] = 0
    return d_support,d_dens

We can check how it looks for data['a'] :

kde = sm.nonparametric.KDEUnivariate(data['a'])
kde.fit()
plt.plot(kde.support,kde.density)
_x,_y = trunc_dens(data['a'])
plt.plot(_x,_y)

enter image description here

You can plot it for both:

fig,ax = plt.subplots()
for i in data.columns:
    _x,_y = trunc_dens(data[i])
    ax.plot(_x,_y)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72