I want to use my bayesian network as a classifier, first on complete evidence data (predict
), but also on incomplete data (bnlearn::cpquery
). But it seems that, even working with the same evidence, the functions give different results (not only based on slight deviation due to sampling).
With complete data, one can easily use R's predict
function:
predict(object = BN,
node = "TargetVar",
data = FullEvidence ,
method = "bayes-lw",
prob = TRUE)
By analyzing the prob
attribute, I understood that the predict
-function simply chooses the factor level with the highest probability assigned.
When it comes to incomplete evidence (only outcomes of some nodes are known), predict
doesn't work anymore:
Error in check.fit.vs.data(fitted = fitted,
data = data,
subset = setdiff(names(fitted), :
required variables [.....] are not present in the data.`
So, I want to use bnlearn::cpquery
with a list of known evidence:
cpquery(fitted = BN,
event = TargetVar == "TRUE",
evidence = evidenceList,
method = "lw",
n = 100000)
Again, I simply want to use the factor with the highest probability as prediction. So if the result of cpquery
is higher than 0.5, I set the prediction to TRUE, else to FALSE.
I tried to monitor the process by giving the same (complete) data to both functions, but they don't give me back the same results. There are large differences, e.g. predict
's "prob"-attribute gives me a p(false) = 27% whereas cpquery
gives me p(false) = 2,2%.
What is the "right" way of doing this? Using only cpquery, also for complete data? Why are there large differences?
Thanks for your help!