Suppose I have a few sentences describing how John spends his days stored in a dataframe in R:
df <- data_frame(sentence = c("John went to work this morning", "John likes to jog", "John is hungry"))
Thus, I want to identify what words are more often repeated when a sentence contains "John". I can use unnest_tokens() to identify consecutive words. How can I identify recurring pairings that are non consecutive?
The goal is to obtain a result that counts how many times each other word appears close to John:
df2 <- data_frame(word1 = c("John", "John", "John", "John", "John", "John", "John", "John", "John"),
word2 = c("went", "to", "work", "this", "morning", "likes", "jog", "is", "hungry"),
n = c(1, 2, 1, 1, 1, 1, 1, 1, 1))