I was wondering if anyone could tell me how to find the optimal number of unique words with text mining, to use for predictive models. This is done by conducting a sentiment analysis (which is completely fine for a pre determined number of words).
However, I have to find a way that enables me to test the accuracy with n number of words, eventually choosing the number that yield the highest result. Is there a metric that one could use to do so? The assignment mentioned something about cross validation, however, I am pretty sure that that was referring to the predictive models.
Could someone help me out with this problem?