1

Suppose we have a data frame (df) containing comments (each row is a comment):

comment
Amazing job
Terrible work

And we have a dictionary (dict) of positive and negative words:

positive negative
amazing  terrible

I'm trying to create two word clouds: one of the positive comment in df, and one of negative comment in df. To do this, I tried the following code but run into an error. Can anyone suggest a fix?

library("quanteda")

corpus_example <- corpus(df)
head(corpus_example)

Output:

text1:
"Amazing job"

text2:
"Terrible work"

Next, create dfm:

comments_dfm <- dfm(corpus_example, dictionary = dict)
head(comments_dfm)

Output:
      positive negative
text1 1        0
text2 0        1

I.e. it shows how many positive and negative words (according to dict) exist within text1 and text2. text1 is considered positive and text2 is considered negative.

Finally, I try to create word clouds using textplot_wordcloud(comments_dfm), but this just returns a word cloud containing the headers of comments_dfm, i.e. the words positive and negative. Instead, I want two word clouds: one containing Amazing job (because it's considered a positive comment), the other containing Terrible work (because it's a negative comment).

Does anyone know how to fix this?

1 Answers1

1

So a few things:

  • The reason positive and negative appear is because you've "mapped" Amazing job and Terrible work to those respective categories. We use dictionaries to correspond raw text to different categories so that we may interpret the data in a meaningful way (e.g. analyzing word frequency to understand sentiment).
  • However, I don't think you need quanteda at all; see below example for positive wordcloud
  • Since you want to preserve phrases, use table; see Creating "word" cloud of phrases, not individual words in R
library(wordcloud)

df <- data.frame(comment = c("Amazing job",
                             "Terrible work",
                             "Great job",
                             "Great job",
                             "Great job",
                             "Great job",
                             "Fantastic job",
                             "Fantastic job",
                             "Fantastic job",
                             "Amazing job",
                             "Amazing job",
                             "Terrible work",
                             "Terrible work"))

dict <- list(
  Positive = c("Amazing","Great","Fantastic"),
  Negative = c("Terrible","Bad","Suck")
)

# finds positive comments / negative comments depending on input
find_matches <- function(comments,dictionary){
  comments[grepl(paste(dictionary,collapse = "|"),
                 comments,
                 ignore.case = TRUE)]
}

# Since you want phrases, using table
positive_table <- table(find_matches(df$comment, dict$Positive))
wordcloud::wordcloud(
  names(positive_table),
  as.numeric(positive_table),
  scale = c(2, 1),
  min.freq = 3,
  max.words = 100,
  random.order = T
)
Ritz735
  • 123
  • 6