Count frequency of n-gram in text using r

Question

I am using R to read the text. A passage consists of 100 sentences,then it is put in a list, the list is like:

[[1]]

[1] "WigWagCo: For #TBT here's a video of Travis McCollum (Co-Founder and COO of WigWag) at #SXSW2016

[[2]]

[1] "chrisreedfilm: RT @hammertonail: #SXSW2016 doc THE SEER: A PORTRAIT OF WENDELL BERRY gets reviewed by @chrisreedfilm 

[[3]]

[1] "iamscottrandell: RT @therevue: Take a jaunt down #MemoriesofSXSW &amp; read the stories of @JRNelsonMusic @thegillsmusic &amp; @TheBlancosMusic 
...
...

[[99]]

[1] "SunPowerTalent: SunPower #Clerical #Job: Supply Chain Intern (#Austin, TX) 

[[100]]

[1] "SunPowerTalent: #Finance #Job alert: General Ledger Accountant | SunPower

Every object in the list is a "sentence" from a same text. How can I count frequency of all 3-gram in this text and know which sentence is each 3-gram from?

Thanks a lot.

score 0 · Answer 1 · answered Apr 12 '16 at 10:49

You can use the R package textcat (https://CRAN.R-project.org/package=textcat) for this. If your list of 100 sentences is called x you simply do:

library("textcat")
n3gram <- textcat_profile_db(x, n = 3)

This is then a list of 100 elements (corresponding to the original sentences) containing the 3-grams ordered by frequency. See ?textcat_profile_db for more details and examples.

Count frequency of n-gram in text using r

1 Answers1