I’m developing a Naive Bayes classifier using the following dataset (https://www.kaggle.com/crowdflower/twitter-user-gender-classification/data).
What i’m trying to do is traing a classifier which allows me to predict the user gender based on twitter text, twitter profile description and twitter profile side color. Since twitter text and profile description attributes are a string columns, I need to preprocessing the data before training the classifier. In order to do that, i saw that in a lot of examples is used the Strings to Document node. Then, this new column Document is preprocessed with other node like Number filter, Case converter and so on.
Since I want use more that one attributes to training my classifier, what I have to do? Should I convert into documents both string attributes (twitter text and profile description)?