I'm building a spam detection system using neural networks. I'm not able to understand how to proceed with what I have currently.
I have- Unread Mails being flagged as read and converted to mail vectors using tf-idf weighing. So basically, My Email Message looks like
Email : (Word1,Score1),(Word2,Score2)...
After doing(parsing , stemming,stopword removal and tf-idf conversion). I have read about feedback network trained via backpropogation and it seems to be the approach followed most commonly. Basically, How do i reduce the dimensionality further of the vectors I have and how to feed it as an input. Also, how does hidden layer behave and how does the number of hidden layer neurons affect the performance of neural network.Also , How is a feature vector different from what I have ? How do I form a feature vector?
Thanks.Looking forward to some clarity.