I have been looking at email datasets for machine learning and noticed that the emails contain header information in addition to email content. Is it best to ignore or skip over the header and focus on the email content? Or, should the header be included? Does this depend on what you are trying to do?
For training Word2Vec, should headers be used?
For classifying email as spam or non spam, should headers be used?