The following code
library(dplyr)
library(janeaustenr)
library(tidytext)
book_words <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word, sort = TRUE)
book_words <- book_words %>%
bind_tf_idf(word, book, n)
book_words
taken from Term Frequency and Inverse Document Frequency (tf-idf) Using Tidy Data Principles, estimates the tf-idf
in Jane Austen's works. Anyway, this code appears to be specific to Jane Austen's books. I would like to derive, istead, the tf-idf
for the following data frame:
sentences<-c("The color blue neutralizes orange yellow reflections.",
"Zod stabbed me with blue Kryptonite.",
"Because blue is your favourite colour.",
"Red is wrong, blue is right.",
"You and I are going to yellowstone.",
"Van Gogh looked for some yellow at sunset.",
"You ruined my beautiful green dress.",
"You do not agree.",
"There's nothing wrong with green.")
df=data.frame(text = sentences,
class = c("A","B","A","C","A","B","A","C","D"),
weight = c(1,1,3,4,1,2,3,4,5))