I'm new to R, and I'm using widyr to do text mining. I successfully used the methods found here to get a list of co-occurring words within each section of text and their phi coefficient.
Code as follows:
word_cors <- review_words %>%
group_by(word) %>%
pairwise_cor(word, title, sort = TRUE) %>%
filter(correlation > .15)
I understand that I can also generate a data frame with co-occurring words and the number of times they appear, using code like:
word_pairs <- review_words %>%
pairwise_count(word, title, sort = TRUE)
What I need is a table that has both the phi coefficient and the number of occurrences for each pair of words. I've been digging into pairwise_cor and pairwise_count but still can't figure out how to combine them. If I understand correctly, joins only take one column into account for matching, so I couldn't use a regular join reliably since there may be multiple pairs that have the same word in the item1 column.
Is this possible using widyr? If not, is there another package that will allow me to do this?
Here is the full code:
#Load packages
pacman::p_load(XML, dplyr, stringr, rvest, httr, xml2, tidytext, tidyverse, widyr)
#Load source material
prod_reviews_df <- read_csv("SOURCE SPREADSHEET.csv")
#Split into one word per row
review_words <- prod_reviews_df %>%
unnest_tokens(word, comments, token = "words", format = "text", drop = FALSE) %>%
anti_join(stop_words, by = c("word" = "word"))
#Find phi coefficient
word_cors <- review_words %>%
group_by(word) %>%
pairwise_cor(word, title, sort = TRUE) %>%
filter(correlation > .15)
#Write data to CSV
write.csv(word_cors, "WORD CORRELATIONS.csv")
I want to add in pairwise_count, but I need it alongside the phi coefficient.
Thank you!