Currently participating in a MOOC and trying my hand at some sentiment analysis, but having trouble with the R code.
What I have is a list of bad words and a list of good words. For instance my bad words are c("dent", "broken", "wear", "cracked") ect.
I have a list of descriptions in my data frame, what I want to do is get a count on how many of my bad words appear in the list and how many of my good words appear for each row.
for instance suppose this is my data frame
desc = c("this screen is cracked", "minor dents and scratches", "100% good", "in perfect condition")
id = c(1,2,3,4)
df = data.frame(id, desc)
bad.words = c("cracked", "scratches", "dents")
what I want is to make a sum column that counts how often each bad word appears in the description
so hoping my final df would look like
id desc sum
1 "this screen is cracked" 1
2 "minor dents and scratches" 2
3 "100% good" 0
4 "in perfect condition" 0
what I have so far is
df$sum <- grepl(paste( bad.words, collapse="|"), df$desc)
which only gets me a true or false if a word appears