0

I have 1-dimensional and 2-dimensional weighted datasets for which I need to calculate optimum bandwidths for kernel smoothing. The formula for the (rule of thumb) optimal bandwidths is

h=(4*sigma-hat^5/3n)^1/5

So I need to be able to calculate the standard deviations of distances between points. I can do this with dist for non-weighted data, but how do I do this in R with weighted data?

A toy example with 2-dimensional data - each data point is a coordinate pair (x,y) - ignoring the weights:

df<-data.frame(x=c(1,2,3,4,5),y=c(1,0,1,0,1),weight=c(0.5,1,1,0.1,1));
sd(dist(df[,c("x","y")]));

so

h=((4*sd(dist(df))^5)/(3*dim(as.matrix(df))[1]))^(1/5);

But I'm not sure how to do this with weighted data. Presumably my weights for each cell of the distance matrix are the products of the weights of the two points whose distance that cell measures - what's the best way to apply these weights to the dist object? Or am I approaching this wrong?

tzirtzi
  • 75
  • 11

0 Answers0