6

In R, using the np package, I have created the bandwidths for a conditional density. What I would like to do is, given some new conditional vector, sample from the resulting distribution.

Current code:

library('np')
# Generate some test data.
somedata = data.frame(replicate(10,runif(100, 0, 1)))
# Conditional variables.
X <- data.frame(somedata[, c('X1', 'X2', 'X3')])
# Dependent variables.
Y <- data.frame(somedata[, c('X4', 'X5', 'X6')])
# Warning, this can be slow (but shouldn't be too bad).
bwsome = npcdensbw(xdat=X, ydat=Y)
# TODO: Given some vector t of conditional data, how can I sample from the resulting distribution?

I am quite new to R, so while I did read the package documentation, I haven't been able to figure out if what I vision makes sense or is possible. If necessary, I would happily use a different package.

gdoug
  • 715
  • 1
  • 5
  • 16
  • I get: `Error: could not find function "npcedensbw"`. Wheb I look at the available functions in the np-package I don't see any by that name. When I re-run with `npcdensbw` and then `plot` the result, I see 6 X vatriable. Now... what was the question exactly? – IRTFM Sep 29 '15 at 02:14
  • Indeed, I am working with multivariate data, both in the conditional and dependent variables. What I would like to do is sample from the determined distribution. Given some new vector for the conditional/independent variables, I want to sample according to the distribution given the conditional variables. In a simpler example, if both x and y were single dimensional, I would want to fix x such that there is a distribution on y, and then sample within that distribution. I want to do the same thing here. Is that more clear? – gdoug Sep 29 '15 at 05:16
  • Just to make sure I understand the question correctly: how does your case differ from FAQ 2.49 in https://cran.r-project.org/web/packages/np/vignettes/np_faq.pdf? – coffeinjunky Aug 14 '17 at 12:38
  • So, if I understand it correctly.. you want to compute stuff like P(X4|X1), or more complex... P(X5|X1,X2,X3)... or even P(X1|X4)... is this correct? – zwep Aug 15 '17 at 08:55

1 Answers1

1

Here is the Example 2.49 from: https://cran.r-project.org/web/packages/np/vignettes/np_faq.pdf , it gives the following solution for for 2 variables:

###
library(np)
data(faithful)
n <- nrow(faithful)
x1 <- faithful$eruptions
x2 <- faithful$waiting
## First compute the bandwidth vector
bw <- npudensbw(~x1 + x2, ckertype = "gaussian")
plot(bw, view = "fixed", ylim = c(0, 3))
## Next generate draws from the kernel density (Gaussian)
n.boot <- 1000
i.boot <- sample(1:n, n.boot, replace = TRUE)
x1.boot <- rnorm(n.boot,x1[i.boot],bw$bw[1])
x2.boot <- rnorm(n.boot,x2[i.boot],bw$bw[2])
## Plot the density for the bootstrap sample using the original
## bandwidths
plot(npudens(~x1.boot+x2.boot,bws=bw$bw), view = "fixed")

Following this hint from @coffeejunky, the following is a possible solution to your problem with 6 variables:

## Generate some test data.
somedata = data.frame(replicate(10, runif(100, 0, 1)))
## Conditional variables.
X <- data.frame(somedata[, c('X1', 'X2', 'X3')])
## Dependent variables.
Y <- data.frame(somedata[, c('X4', 'X5', 'X6')])
## First compute the bandwidth vector
n <- nrow(somedata)
bw <- npudensbw(~X$X1 + X$X2 + X$X3 + Y$X4 + Y$X5 + Y$X6, ckertype = "gaussian")
plot(bw, view = "fixed", ylim = c(0, 3))
## Next generate draws from the kernel density (Gaussian)
n.boot <- 1000
i.boot <- sample(1:n, n.boot, replace=TRUE)
x1.boot <- rnorm(n.boot, X$X1[i.boot], bw$bw[1])
x2.boot <- rnorm(n.boot, X$X2[i.boot], bw$bw[2])
x3.boot <- rnorm(n.boot, X$X3[i.boot], bw$bw[3])
x4.boot <- rnorm(n.boot, Y$X4[i.boot], bw$bw[4])
x5.boot <- rnorm(n.boot, Y$X5[i.boot], bw$bw[5])
x6.boot <- rnorm(n.boot, Y$X6[i.boot], bw$bw[6])
## Plot the density for the bootstrap sample using the original
## bandwidths
ob1 <- npudens(~x1.boot + x2.boot + x3.boot + x4.boot + x5.boot + x6.boot, bws = bw$bw)
plot(ob1, view = "fixed", ylim = c(0, 3))
trosendal
  • 1,225
  • 9
  • 14
  • This example samples from a kernel unconditional density estimate (using `npudensbw`), but not for a kernel conditional density estimate (which would be using `npcdensbw`). Maybe there is a simple way to adapt this code to try this, but I don't see it well documented in the `np` help files and would value a clear answer for this particular facet of the question. – Nicholas G Reich Aug 21 '17 at 00:30