I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.
/**
* Initialization
*/
Instances data = ...;
BayesNet bn = new EditableBayesNet(data);
SearchAlgorithm learner = new TAN();
SimpleEstimator estimator = new SimpleEstimator();
/**
* Training
*/
bn.initStructure();
learner.buildStructure(bn, data);
estimator.estimateCPTs(bn);
Suppose the Instances object data
has three attributes, A, B and C, and the dependency discovered is B->A, C->B.
The trained Bayesnet object bn
is not for classification (I did not specify the class attribute for data
), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn
?
As far as I know, the distributionForInstance
function of BayesNet
may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance
object testDataInstance
and set value A=x
and B=y
, and call distributionForInstance
with testDataInstance
.
/**
* Obtain Pr(A="x", B="y")
*/
Instance testDataInstance = new SparseInstance(3);
Instances testDataSet = new Instances(
bn.m_Instances);
testDataSet.clear();
testDataInstance.setValue(testDataSet.attribute("A"), "x");
testDataInstance.setValue(testDataSet.attribute("B"), "y");
testDataSet.add(testDataInstance);
bn.distributionForInstance(testDataSet.firstInstance());
However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data
, it is unclear to me what the returned probability distribution means.