0

I am following the steps at the end of this post to implement a transformed Kernel Density Estimate (KDE) on a bounded support [0,+inf[. We use the transformation trick to avoid the boundary bias of the traditional KDE on bounded support (in that case, near zero). Basically, the KDE allocates weights to observations that do not exist (outside the support), so it severely underestimates the PDF at the boundary (as shows well on the figure below).

1) Regular approach (we observe the undesirable boundary bias of the KDE near zero)

# sample from exponential distribution
obs=rexp(5e2)
hist(obs,freq=FALSE)
k=density(obs)
lines(k$x,k$y)

enter image description here

2) Transformation approach

enter image description here

# 1) log transform the obs
pseudo.obs=log(obs)
# 2) estimate the density of the pseudo obs with KDE
pseudo.k=density(pseudo.obs,n=length(obs))
# 3) estimate the density of the original obs
t.density=pseudo.k$y/obs
# plot estimation
lines(obs,t.density)

Instead of getting something similar to the blue line below as I should

enter image description here

I'm getting this horrible thing enter image description here

Antoine
  • 1,649
  • 4
  • 23
  • 50
  • You guess you should use something like `pseudo.k$x` and not `obs` to plot `t.density`. –  Sep 30 '15 at 09:03
  • just tried it, it still gives terrible results... – Antoine Sep 30 '15 at 09:06
  • Yes, but is the calculation of `t.density` correct? –  Sep 30 '15 at 09:09
  • well, I am estimating the distribution of the pseudo obs with a KDE and then dividing by the original values, which seems to be faithful to the formula above... – Antoine Sep 30 '15 at 09:15
  • `pseudo.k$x` won't work because it deals with the transformed space, whereas we want a plot in the original space – Antoine Sep 30 '15 at 09:24
  • I just gave you a hint. `obs` is not the correct space neither, if I am not mistaken. –  Sep 30 '15 at 09:27

1 Answers1

0

I could use a KDE on my stupidity without using any transformation, because it is unbounded. Here is some code that works:

# everything before is the same
# 2) estimate the density of the pseudo obs with KDE
pseudo.k=approxfun(density(pseudo.obs))
# 3) estimate the density of the original obs
seq=seq(min(obs),max(obs),length.out=500)
t.density=as.numeric(vector(length=length(seq)))
for (i in 1:length(seq)){
x=seq[i]
t.density[i]=pseudo.k(log(x))/x
}
# plot result
lines(seq,t.density,col="red")

enter image description here

Antoine
  • 1,649
  • 4
  • 23
  • 50