0

I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.

/**
 * Initialization
 */
Instances data = ...;
BayesNet bn = new EditableBayesNet(data);
SearchAlgorithm learner = new TAN();
SimpleEstimator estimator = new SimpleEstimator();
/**
 * Training
 */
bn.initStructure();
learner.buildStructure(bn, data);
estimator.estimateCPTs(bn);

Suppose the Instances object data has three attributes, A, B and C, and the dependency discovered is B->A, C->B.

The trained Bayesnet object bn is not for classification (I did not specify the class attribute for data), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn?

As far as I know, the distributionForInstance function of BayesNet may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance object testDataInstance and set value A=x and B=y, and call distributionForInstance with testDataInstance.

/**
 * Obtain Pr(A="x", B="y")
 */ 
Instance testDataInstance = new SparseInstance(3);
Instances testDataSet = new Instances(
            bn.m_Instances);
testDataSet.clear();
testDataInstance.setValue(testDataSet.attribute("A"), "x");
testDataInstance.setValue(testDataSet.attribute("B"), "y");
testDataSet.add(testDataInstance);
bn.distributionForInstance(testDataSet.firstInstance());

However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data, it is unclear to me what the returned probability distribution means.

Mark Jin
  • 2,616
  • 3
  • 25
  • 37

1 Answers1

1

The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-

So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode) or getDistribution(java.lang.String sName) to achieve your answer.

P(A=x, B=y) can be calculated as follows,

P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,

P(A=x, B=y) = P(A=x|B=y)*P(B=y)

Here is a pseudocode which illustrates my approach,

double[][] AP = bn.getDistribution("A"); // gives P(A|B) table
double[][] BP = bn.getDistribution("B"); // gives P(B|C) table
double BPy = 0;

// I am assuming x,y to be ints, but if they are not,
// there should be some way of calculating BP[0][y] or AP[y][x]
// BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
for(int i=0;i<BP.length;i++){
    BPy+=BP[0][y];
}
//BPy now contains probability of P(B=y)
System.out.println(AP[y][x]*BPy)
mettleap
  • 1,390
  • 8
  • 17
  • Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that? – Mark Jin Nov 27 '18 at 02:58
  • 1
    @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs – mettleap Nov 27 '18 at 05:39
  • Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default. – Mark Jin Nov 27 '18 at 07:11
  • Btw, I created a new thread at https://stackoverflow.com/questions/53494595/weka-why-getmargin-returns-all-zeros for this question. – Mark Jin Nov 27 '18 at 07:22