R - Finding top words in each NRC sentiment and emotion using syuzhet package

Question

Snapshot of the dataset:

I'm getting following chart:

Here is the code:

library(tidytext)
library(syuzhet)

lyrics$lyric <- as.character(lyrics$lyric)

tidy_lyrics <- lyrics %>% 
  unnest_tokens(word,lyric)

song_wrd_count <- tidy_lyrics %>% count(track_title)

lyric_counts <- tidy_lyrics %>%
  left_join(song_wrd_count, by = "track_title") %>% 
  rename(total_words=n)

lyric_sentiment <- tidy_lyrics %>% 
  inner_join(get_sentiments("nrc"),by="word")

lyric_sentiment %>% 
count(word,sentiment,sort=TRUE) %>%
group_by(sentiment)%>%top_n(n=10) %>% 
ungroup() %>%
  ggplot(aes(x=reorder(word,n),y=n,fill=sentiment)) + 
  geom_col(show.legend = FALSE) + 
  facet_wrap(~sentiment,scales="free") + 
  coord_flip()

The issue is that I'm not sure if the result I'm getting is correct or not. For instance, you can see 'bad' is part of multiple emotions. Also, if we inspect lyric_sentiment, we'd see that word 'shame' is present four times for 'Tim McGraw'. In reality it appears only twice in this song.

What's the right approach?

score 1 · Accepted Answer · answered Jul 11 '18 at 12:59

You are doing it correct. nrc sentiments can place words in multiple sentiment sections. You can see this in the following example. You can also look up values on the nrc homepage

library(dplyr)
library(tidytext)

nrc <- get_sentiments("nrc")
nrc %>% filter(word %in% c("bad", "shame"))
# A tibble: 9 x 2
  word  sentiment
  <chr> <chr>    
1 bad   anger    
2 bad   disgust  
3 bad   fear     
4 bad   negative 
5 bad   sadness  
6 shame disgust  
7 shame fear     
8 shame negative 
9 shame sadness

R - Finding top words in each NRC sentiment and emotion using syuzhet package

1 Answers1