I have a dataframe with sentences which I used countvectorizer on with a pre-defined vocabulary. For some of the vocabulary words, the return is 0 even though the sentences include the words in the dictionary. the list of words that for some reason do not work are:
* 1 time
* 1 report
* 7 increase
* not a good fit
* not a great fit
* c level
* not a need
the CountVectorizer is defined as follows:
CountVectorizer(vocabulary=cols,ngram_range=(1,5))
where cols is the dictionary
I'm pretty sure this has to do with the tokenizer definitions but not sure how to change it to what I need any help would be appreciated Thanks!