if in a sentiment analysis with R, during the text cleaning I want to maintain for example the entire new york word instead of new and york separately how can I do it? because when I call the documentTermMatrix and I get a look at the freq of the words, appear to me new and york separately like single words.. thanks
Asked
Active
Viewed 55 times
0
-
1NER (Named Entity Recognition) will help with that. In R `spacyr` can be used for that purpose. – Till Feb 09 '22 at 17:07
-
Thanks...i will try – Francesco Borro Feb 09 '22 at 17:13
-
i can't understand how to do it.. – Francesco Borro Feb 09 '22 at 17:43
-
NER will mark "New York" as a city, you can then make sure that city names (and other multi-word entities) are not split, for example by replacing the space with an underscore (e.g. New_York). – Till Feb 09 '22 at 17:47
-
ok, but I didn't understand when and where applicable this NER in the text cleaning and with which function. – Francesco Borro Feb 09 '22 at 17:58
-
Does this answer your question? [How to include select 2-word phrases as tokens in tidytext?](https://stackoverflow.com/questions/57303849/how-to-include-select-2-word-phrases-as-tokens-in-tidytext) – Till Feb 09 '22 at 20:16
-
Take a look again at [ask]. Questions should have more specific detail than this – camille Feb 09 '22 at 21:22