0

Text analysis with R.

My dataset is 2000 comments from 2000 different surveys. I have created Bi-grams. I have checked frequecy of words, then word cluster analysis with hclust(), then Word association with findAssocs, for example, findAssocs(bigram_dtm,"long time",0.2).

For example, I am seeing that "long time" has an association of 0.66 with " felt waiting".

I have tried to find it online but not success yet... Questions: Is there any way I can print comments where this bi_grams are together? Is there any way I can print comments where "long time" are?

Thanks,

Robbie
  • 121
  • 11
  • 1
    How is your data organized? Do you have the comments as an array of 2000 strings? If so, you can use grep to find which comments contain each of the bigrams and therefore which contain both. – G5W Oct 21 '18 at 23:18
  • Hey @G5W . [str(files)] returns: ["$ verb: Factor w/ 239 levels..."]. Sorry if I do not explain it correctly. When I import the file into R, it is 2000 rows, one comment per row...Would that help? Thanks! – Robbie Oct 24 '18 at 23:57

1 Answers1

0

I think that what you are looking for is grep. You can use it to get the indices of the comments you are looking for or use those indices to get at the comments themselves.

Comments = c("I haven't seen you in a long time.",
    "There is no U in TEAM, but it does contain ME.",
    "In extreme cases, read the documentation.",
    "A big computer, a complex algorithm and a long time does not equal science.",
    "Use the source, Luke!")

grep("long time", Comments)
[1] 1 4
Comments[grep("long time", Comments)]
[1] "I haven't seen you in a long time."                                         
[2] "A big computer, a complex algorithm and a long time does not equal science."

( Some comments stolen from fortune() )

G5W
  • 36,531
  • 10
  • 47
  • 80