0

I'm doing text analysis using tidytext. I am trying to calculate the tf-idf for a corpus. The standard way to do this is:

book_words <- book_words %>%
   bind_tf_idf(word, book, n)

However, in my case, the 'document' is not defined by a single column (like book). Is it possible to call bind_tf_idf where the document is defined by two columns (for example, book and chapter)?

Kewl
  • 3,327
  • 5
  • 26
  • 45
  • Not sure I understand. Can't you just bind the two columns together to yield one column of text? Something like: cbind(book, chapter) – triddle May 08 '17 at 15:34

1 Answers1

3

Why not concatenate both columns? E.g.

library(tidyverse)
library(tidytext)
library(janeaustenr)
book_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word, sort = TRUE) %>%
  ungroup()
book_words$chapter <- sample(1:10, nrow(book_words), T)
book_words %>%
  unite("book_chapter", book, chapter) %>%
  bind_tf_idf(word, book_chapter, n) %>% print %>%
  separate(book_chapter, c("book", "chapter"), sep="_") %>% 
  arrange(desc(tf_idf))
lukeA
  • 53,097
  • 5
  • 97
  • 100