1

I'm trying to determine the probability of a new record to belong to an existing data-set. I'm using the BNlearn R package to build a Bayesian Network using a large training set.

I then want to assess how anomalous a new record is. For this I want to get a probability for a record for which I have full evidence but don't need to predict any variable.

The pcquery method seems to require at least one variable to predict. The documentation states that the predict method will ignore entries with full evidence.

I spent a day searching the BNlearn documentation without success. So I think it is either not possible with BNlearn or I'm missing the right vocabulary to find what I need in the docs.

Any insights from someone who has more experience with BNlearn is welcome.

Michal T
  • 601
  • 5
  • 14
  • Michal ; given your other question, you have found the `pEvidence` function - please consider writing that up as an answer here as someone else may find it useful. – user20650 May 05 '19 at 12:41
  • I would, but the pEvidence function doesn't work as I'm expecting it to work for the unknown evidence values. I'll first need to understand why exactly is it behaving as it is. For now the probability goes drastically up as soon as there is an unknown evidence value. – Michal T May 05 '19 at 22:25

1 Answers1

0

The cpquery estimates the conditional probability of an event given an evidence. However, the bnlearn documentation states:

If either event or evidence is set to TRUE an unconditional probability query is performed with respect to that argument.

For example, with the asia dataset:

library(bnlearn)

data(asia)

bn.dag <- model2network("[A][S][T|A][L|S][B|S][D|B:E][E|T:L][X|E]")
bn.fitted  <- bn.fit(bn.dag, asia)

for (i in c(1:1000)) {
  prob[i] <- cpquery(bn.fitted, 
                     event = (A == "no") & (S == "no") & (T == "no") & (L == "no") & 
                             (B == "no") & (E == "no") & (X == "no") & (D == "no"), 
                     evidence = TRUE)
}

summary(prob)

# Result:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2714  0.2864  0.2908  0.2909  0.2954  0.3132 
Flavia
  • 298
  • 1
  • 6