0

I want to use my bayesian network as a classifier, first on complete evidence data (predict), but also on incomplete data (bnlearn::cpquery). But it seems that, even working with the same evidence, the functions give different results (not only based on slight deviation due to sampling).

With complete data, one can easily use R's predict function:

predict(object = BN,
        node = "TargetVar",
        data = FullEvidence ,
        method = "bayes-lw",
        prob = TRUE)

By analyzing the prob attribute, I understood that the predict-function simply chooses the factor level with the highest probability assigned.

When it comes to incomplete evidence (only outcomes of some nodes are known), predict doesn't work anymore:

    Error in check.fit.vs.data(fitted = fitted, 
                               data = data, 
                               subset = setdiff(names(fitted),  : 
    required variables [.....] are not present in the data.` 

So, I want to use bnlearn::cpquery with a list of known evidence:

cpquery(fitted = BN, 
        event = TargetVar == "TRUE", 
        evidence = evidenceList, 
        method = "lw",
        n = 100000)

Again, I simply want to use the factor with the highest probability as prediction. So if the result of cpquery is higher than 0.5, I set the prediction to TRUE, else to FALSE.

I tried to monitor the process by giving the same (complete) data to both functions, but they don't give me back the same results. There are large differences, e.g. predict's "prob"-attribute gives me a p(false) = 27% whereas cpquery gives me p(false) = 2,2%.

What is the "right" way of doing this? Using only cpquery, also for complete data? Why are there large differences?

Thanks for your help!

locom
  • 115
  • 9
  • Hi, So that your question is reproducible (and so easier for you to get help) can you edit your question to use one of the bnlearn example datasets please (perhaps your probabilities would be closer of you also increased the number of samples in the predict call ie n=1e5) – user20650 May 09 '18 at 10:58

1 Answers1

1

As user20650 put it, increasing the number of samples in the predict call was the solution to get very similar results. So just provide the argument n = ... in your function call.

Of course that makes sense, I just didn't know about that argument in the predict() function. There's no documentation about it in the bn.fit utilities and also none in the quite generic documentation of predict.

locom
  • 115
  • 9