1

I am relatively beginner in R and trying to figure out how to use cpquery function for bnlearn package for all edges of DAG.

First of all, I created a bn object, a network of bn and a table with all strengths.

library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)

Then I tried to create a new variable in sttbl dataset, which was the result of cpquery function.

sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)

It looks pretty good (especially on bigger data), but when I am trying to automate this process somehow, I am struggling with errors, such as:

Error in sampling(fitted = fitted, event = event, evidence = evidence, : logical vector for evidence is of length 1 instead of 10000.

In perfect situation, I need to create a function that fills the prob generated variable of sttbl dataset regardless it's size. I tried to do it with for loop to, but stumbled over the error above again and again. Unfortunately, I am deleting failed attempts, but they were smt like this:

for (i in 1:nrow(sttbl)) {
     j = sttbl[i,1]
     k = sttbl[i,2]
     sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}

or this:

for (i in 1:nrow(sttbl)) {
     sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}

Now I think I misunderstood something in R or bnlearn package.

Could you please tell me how to realize this task with filling the column by multiple cpqueries? That would help me a lot with my research!

Alex Popov
  • 15
  • 1
  • 7
  • Hi, can you show how you are trying to automate this please (as the code you show works) – user20650 Jun 01 '20 at 00:50
  • @user20650 I was completely sure I am mistaken, so I've deleted all my attempts. Although I remembered 2 examples and completed the question by them. – Alex Popov Jun 01 '20 at 14:56
  • thanks Alex; I'll post something.it is difficult to use bnlearn inference tools in a loop. – user20650 Jun 01 '20 at 15:08

1 Answers1

1

cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.

Your example

library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)

This uses cpquery and the much maligned eval(parse(...)) -- this is the approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,

# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)

# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
  qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
  sttbl$prob[i] = eval(parse(text=qtxt))
}

I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))). The evidence is added in a named list i.e. list(A='a').

# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.  
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)

# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
  temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
  w = attr(temp, "weights")  
  sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}

sttbl
#   from to   strength      prob     prob2
# 1    A  D -1938.9499 0.6186238 0.6233387
# 2    A  B -1153.8796 0.6050552 0.6133448
# 3    C  D  -823.7605 0.7027782 0.7067417
# 4    B  E  -720.8266 0.7332107 0.7328657
# 5    F  E  -549.2300 0.5850828 0.5895373
user20650
  • 24,654
  • 5
  • 56
  • 91
  • your methods work perfectly well for my binary data! (as you noticed c:). However, when I try to make it function with _sttbl_ and _fit_ as variables (to make the loop applicable for different cases via one function) it's not working. Here what I tried: `probs = function(sttable, fittedbn){ evidence = setNames(replicate(nrow(sttable), "1", simplify = FALSE), sttable$to) for(i in 1:nrow(sttable)) { temp = cpdist(fittedbn, sttable$from[i], evidence[i], method="lw") w = attr(temp, "weights") sttable$prob[i] = sum(w[temp==1])/ sum(w)}}`. Nothing happenes. – Alex Popov Jun 01 '20 at 23:45
  • R does not update by reference so you need to return `sttable` i.e. add `return(sttable)` before the final parenthesis – user20650 Jun 02 '20 at 00:06
  • 1
    ... although I think id maybe return the vector of probabilities and assign this to the data set. That is change the relevant lines to `prob[i]=sum(w[temp==1])/ sum(w) } ; return(prob)` and then run the function and assign it with `sttable$prob = probs(sttable, fit)` – user20650 Jun 02 '20 at 00:13