0

I would like to draw samples from a conditional bayesian network (i.e. some input nodes without parents have no distributions attached), given input nodes values, with bnlearn. The solutions I have tried are inefficient, I would expect it to be as efficient as rbn given that we can perform forward sampling in this situation.

For instance if I have the discrete network: A -> B, I want to draw random values for B given a data vector for A. I can do that with impute (very inefficient) or generate random data with rbn, filtering with my input conditions and picking a random row (which is also inefficient as a lot of samples are thrown away).

For instance, here is a basic A -> B network:

library(bnlearn)

# Basic custom network
net <- as.bn("[A][B|A]")
# A is my input node.
cptA <- matrix(c(0.5, 0.5), ncol = 2, dimnames = list(NULL, c("blue", "red")))
cptB <- matrix(c(0.3, 0.7, 0.8, 0.2), ncol = 2, 
  dimnames = list("B" = c("bad", "good"), "A" = c("blue", "red")))
bnfit <- custom.fit(net, dist = list(A = cptA, B = cptB))

And here is an attempt with "impute":

library(tictoc)
n <- 100000
data <- data.frame(A = factor(c(rep("blue", 0.3 * n), rep("red", 0.7 * n)), levels = c("blue", "red")),
                   B = factor(rep(NA, n), levels = c("bad", "good")))

# A bit hacky, disable check.data, which throws an error when there is a full missing columns
check.data <- function(...){}
assignInNamespace("check.data", check.data, ns = "bnlearn")

tic(); r <- impute(bnfit, data, n = 1); toc()
# 53.165 sec elapsed

Compared to:

tic(); r2 <- rbn(bnfit, 100000); toc()
# 0.026 sec elapsed

Have I missed something or should I implement this by hand?

EDIT: after the comment of user20650, I tried with predict with much better performance than impute, not sure how because documentation says that "‘impute()’ is based on ‘predict()’".

tic(); p <- predict(bnfit, data = data, node = "B", method = "bayes-lw", n = 1); toc()
# 0.374 sec elapsed

Much more reasonable time, but still 15 times slower than a rbn (150 times slower with 1M rows). I am still a bit afraid of how it scales with network size. I will investigate a little.

PierreC.
  • 28
  • 5
  • 1
    I suspect you want to do something with `predict`. You can return the probabilities and then resample or perhaps use the `"bayes-lw"` method: – user20650 Jun 17 '21 at 22:24
  • Thank you, it was helpful. I edited my question where I tested "predict" with improved performance over impute. ("prob" would break with mixed discrete continuous networks). It seems predict makes still more under the hood than what I would expect. – PierreC. Jun 18 '21 at 15:36

0 Answers0