Finding the optimal unique number of words to estimate a predictive model with target x, using cross-validation

Question

I was wondering if anyone could tell me how to find the optimal number of unique words with text mining, to use for predictive models. This is done by conducting a sentiment analysis (which is completely fine for a pre determined number of words).

However, I have to find a way that enables me to test the accuracy with n number of words, eventually choosing the number that yield the highest result. Is there a metric that one could use to do so? The assignment mentioned something about cross validation, however, I am pretty sure that that was referring to the predictive models.

Could someone help me out with this problem?

Hi Frank - this is not a SO specific question, likely better suited on Cross-Validated. But I'm not sure I understand why you don't think cross validation is not what you need. Sounds to me like it is. — Phil, May 10 '20 at 03:55
Hi Philip, could you maybe tell me how I could use cross validation to do so then? Also the question is indeed vague, this is what is asked in the assignment: Estimate a model that predicts based on a given review text whether a review is a five-star review or not. In doing so, you will have to appropriately pre-process the text of the reviews. Use cross-validation to determine how many unique words to include in the model and to compare the performance of multiple models. I was able to find everything out, except finding the right amount of words for the best model — Frank Li, May 10 '20 at 09:21

Finding the optimal unique number of words to estimate a predictive model with target x, using cross-validation

0 Answers0