I would like to draw samples from a conditional bayesian network (i.e. some input nodes without parents have no distributions attached), given input nodes values, with bnlearn. The solutions I have tried are inefficient, I would expect it to be as efficient as rbn
given that we can perform forward sampling in this situation.
For instance if I have the discrete network: A -> B, I want to draw random values for B given a data vector for A. I can do that with impute
(very inefficient) or generate random data with rbn
, filtering with my input conditions and picking a random row (which is also inefficient as a lot of samples are thrown away).
For instance, here is a basic A -> B network:
library(bnlearn)
# Basic custom network
net <- as.bn("[A][B|A]")
# A is my input node.
cptA <- matrix(c(0.5, 0.5), ncol = 2, dimnames = list(NULL, c("blue", "red")))
cptB <- matrix(c(0.3, 0.7, 0.8, 0.2), ncol = 2,
dimnames = list("B" = c("bad", "good"), "A" = c("blue", "red")))
bnfit <- custom.fit(net, dist = list(A = cptA, B = cptB))
And here is an attempt with "impute":
library(tictoc)
n <- 100000
data <- data.frame(A = factor(c(rep("blue", 0.3 * n), rep("red", 0.7 * n)), levels = c("blue", "red")),
B = factor(rep(NA, n), levels = c("bad", "good")))
# A bit hacky, disable check.data, which throws an error when there is a full missing columns
check.data <- function(...){}
assignInNamespace("check.data", check.data, ns = "bnlearn")
tic(); r <- impute(bnfit, data, n = 1); toc()
# 53.165 sec elapsed
Compared to:
tic(); r2 <- rbn(bnfit, 100000); toc()
# 0.026 sec elapsed
Have I missed something or should I implement this by hand?
EDIT: after the comment of user20650, I tried with predict with much better performance than impute, not sure how because documentation says that "‘impute()’ is based on ‘predict()’".
tic(); p <- predict(bnfit, data = data, node = "B", method = "bayes-lw", n = 1); toc()
# 0.374 sec elapsed
Much more reasonable time, but still 15 times slower than a rbn
(150 times slower with 1M rows). I am still a bit afraid of how it scales with network size. I will investigate a little.