How to correctly remove stop words using tidytext package in R?

Question

I am using stopwords dataset in tidytext package in R to remove stopwords. I am using following code:

library(tidyverse)
library(tidytext)
library(dplyr)

data(stop_words)
example_words <- c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog","i'm","don’t","it’s","i’ve")
filtered_words <- example_words[!example_words %in% stop_words$word]
filtered_words

The final output is as follows:

> filtered_words
[1] "quick" "brown" "fox"   "jumps" "lazy"  "dog"   "don’t" "it’s"  "i’ve"

We can see the stop words like "don’t" "it’s" "i’ve" still presented in the filtered output. But those stop words are actually presented in the stop word dataset and somehow not get removed. So could anyone help me to figure out why is it not removing some of these words that are presented in the stop words dataset?

score 3 · Accepted Answer · answered Apr 06 '23 at 22:26

Try replacing your (typographic) apostrophe with this: '

library(tidyverse)
library(tidytext)
library(dplyr)

data(stop_words)
example_words <- c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog","i'm","don't","it's","i've")
filtered_words <- example_words[!example_words %in% stop_words$word]
filtered_words 
#> [1] "quick" "brown" "fox"   "jumps" "lazy"  "dog"

^{Created on 2023-04-07 with reprex v2.0.2}

How to correctly remove stop words using tidytext package in R?

1 Answers1