I have been working on a machine learning project with tweets, including a classification problem. As a consequence, I have a training set and a testing set of tweets.
On the training set, I have computed a TF-IDF matrix with "tm" R package:
library(tm)
text_matrix <- DocumentTermMatrix(myCorpus_2,
control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE)))
Now, I want to get a similar term document matrix for my test dataset, with the same words in columns.
And I do not have any idea on how to generate a TF-IDF matrix while specifying the list of columns I want. Does any of you know how I could do ?
EDIT: Actually, I am looking for an equivalent of sklearn.feature_extraction.text.TfidfVectorizer in R.