Given that the Bayesian formula is:
P(A|B) = (P(B|A) * P(A)) / P(B)
Lets say that I want to train a classifier to classify spam/ham. Lets say also, that in the real world, we get about 1% spam. So given a sample input, we would expect about 1% spam.
When I am training my classifier, should I train it with documents that contain only 1% spam, or is it ok to train my classifier with a much larger percentage of spam then I would expect to find in the real world.
I ask this, because if I have a much larger percentage of spam, then the value for
P(A)
is going to be abnormally large. Will this throw off my classifier, and in this case would it classify some "ham" documents as "spam"?