0

I am working in analyzing the pairwise correlation of words appearing in user reviews and plotting them in the form of the correlation network graph.

My sample data is as follows:

review_corwords

           Label Rating                word
1            1      1                connect
1.1          1      1                    gps
1.2          1      1                    app
1.3          1      1                connect
1.4          1      1                    gps
1.5          1      1                 matter
1.6          1      1                   long
1.7          1      1                    gps
1.8          1      1                    set
1.9          1      1                   high
1.10         1      1               accuracy
1.11         1      1                setting
1.12         1      1                 appear
1.13         1      1                    set
1.14         1      1                    app
1.15         1      1                useless
1.16         1      1                   cant
1.17         1      1                  track
1.18         1      1                workout
2            1      5                   wish
2.1          1      5                  would
2.2          1      5               interest
2.3          1      5                 google
2.4          1      5                provide
2.5          1      5                 weekly
2.6          1      5                monthly
2.7          1      5                summary
3            1      1                useless

Then I perform this:

library(widyr)
# count words co-occuring within a label
word_pairs <- review_corwords %>%
  pairwise_count(word, Label,sort = TRUE)

whose output is as follows:

# A tibble: 16,333,722 x 3
   item1    item2       n
   <chr>    <chr>   <dbl>
 1 gps      connect     1
 2 app      connect     1
 3 matter   connect     1
 4 long     connect     1
 5 set      connect     1

However, when I try to perform a correlation analysis of the same I get the following:

word_cors <- review_corwords %>%
  group_by(word) %>%
  pairwise_cor(word, Label, sort = TRUE)

# A tibble: 16,333,722 x 3
   item1    item2   correlation
   <chr>    <chr>         <dbl>
 1 gps      connect         NaN
 2 app      connect         NaN
 3 matter   connect         NaN
 4 long     connect         NaN
 5 set      connect         NaN
 6 high     connect         NaN

I need to find the right correlation values for the word pairs, please help.

IronMaiden
  • 552
  • 4
  • 20
  • 1
    Try with `word_cors <- review_corwords %>% group_by(word) %>% mutate(freq=n()) %>% pairwise_cor(word, freq, sort = TRUE)` – Marco Sandri Feb 03 '19 at 12:06
  • There isn't quite enough information here for us to see why you are getting `NaN` for correlation results. Can you build a small, reproducible example that gives you the same result? https://www.tidyverse.org/help/ – Julia Silge Feb 25 '19 at 01:28

0 Answers0