Text cleaning in R - Sentiment Analysis

Asked Feb 09 '22 at 16:55

Active Feb 09 '22 at 16:55

Viewed 55 times

if in a sentiment analysis with R, during the text cleaning I want to maintain for example the entire new york word instead of new and york separately how can I do it? because when I call the documentTermMatrix and I get a look at the freq of the words, appear to me new and york separately like single words.. thanks

asked Feb 09 '22 at 16:55

Francesco Borro

1

NER (Named Entity Recognition) will help with that. In R `spacyr` can be used for that purpose. – Till Feb 09 '22 at 17:07
Thanks...i will try – Francesco Borro Feb 09 '22 at 17:13
i can't understand how to do it.. – Francesco Borro Feb 09 '22 at 17:43
NER will mark "New York" as a city, you can then make sure that city names (and other multi-word entities) are not split, for example by replacing the space with an underscore (e.g. New_York). – Till Feb 09 '22 at 17:47
ok, but I didn't understand when and where applicable this NER in the text cleaning and with which function. – Francesco Borro Feb 09 '22 at 17:58
Does this answer your question? [How to include select 2-word phrases as tokens in tidytext?](https://stackoverflow.com/questions/57303849/how-to-include-select-2-word-phrases-as-tokens-in-tidytext) – Till Feb 09 '22 at 20:16
Take a look again at [ask]. Questions should have more specific detail than this – camille Feb 09 '22 at 21:22

Text cleaning in R - Sentiment Analysis

0 Answers0