I'm still having some trouble understanding what Bias and Variance for a specific estimator actually are.
I'm working with the definition of Bias as it is found on Wikipedia:
If we define kernel-density-estimates as
But how can I apply this to kernel density estimation, or to be more exact Parzen Windows? Can someone at least give me an idea how the estimated density f_hat(x) relates to Bias (and Variance)?
Qualitative I can already tell, that a box-window containing the whole data space will have maximum bias and no variance as the estimated density will simply be the average of the whole training data set.