I am using Naive Bayes in text classification.
Assume that my vocabulary is ["apple","boy","cup"] and the class label is "spam" or "ham". Each document will be covered to a 3-dimentional 0-1 vector. For example, "apple boy apple apple" will be converted to [1,1,0]
Now I have calculate the conditional probability p("apple"|"spam"), p("apple"|"ham"), p("boy"|"spam")...etc from training examples.
To test whether a document is spam or ham, like "apple boy" -> [1,1,0], we need to compute p(features | classLabel)
Use conditional independence,for test vector [1,1,0]
I know the two formulas
(1) p(features|"ham") = p("apple"|"ham")p("boy"|"ham")
(2) p(features|"ham") = p("apple"|"ham")p("boy"|"ham")(1-p("cup"|"ham"))
which formula is right?
I believe that (2) is right because we have 3 features (actually 3 words in vocabulary). But I see codes written by others using (1). Although the term 1-p("cup"|"ham") is nearly 1 so it won't make too much difference, but I want the exact answer.