According to several people on StackOverflow Bayesian filtering is better than Neural Networks for detecting spam.
According to the literature I've read that shouldn't be the case. Please explain!
According to several people on StackOverflow Bayesian filtering is better than Neural Networks for detecting spam.
According to the literature I've read that shouldn't be the case. Please explain!
There is no mathematical proof or explanation that can explain why the applications of Neural Networks have not been as good at detecting spam as Bayesian filters. This does not mean that Neural Networks would not produce similar or better results, but the time it would take for one to tweak the Neural Network topology and train it to get even approximately the same results as a Bayesian filter is simply not justified. At the end of the day, people care about results and minimizing the time/effort achieving those results. When it comes to spam detection, Bayesian filters get you the best results with the least amount of effort and time. If the spam detection system using Bayesian filters detects 99% of the spam correctly, then there is very little incentive for people to spend a lot of time adjusting Neural Networks just so they can eek out an extra 0.5% or so.
"According to the literature I've read that shouldn't be the case."
It's technically correct. If properly configured, a Neural Network would get as good or even better results than the Bayesian filters, but its the cost/benefit ratio that makes the difference and ultimately the trend.
Neural Networks works mostly as black box approach. You determine your inputs and outputs. After that finding suitable architecture (2 hidden layer Multi layer perceptron , RBF network etc) is done mostly empirically. There are suggestions for determining architecture but they are, well suggestions. This is good for some problems since we, domain analyst, do not have enough information about problem itself. Ability of NN to find an answer is a wanted thing.
Bayesian Network is on the other hand is designed mostly by domain analyst. Since spam classification is a well known problem, a domain analyst can tweak architecture more easily. Bayesian network would get better results more easily in this way.
Also most NNs are not very good with changing features therefore almost always need to be RE-trained, an expensive operation. Bayesian network on the other hand may only change probabilities.