-1

I am using R to read the text. A passage consists of 100 sentences,then it is put in a list, the list is like:

[[1]]

[1] "WigWagCo: For #TBT here's a video of Travis McCollum (Co-Founder and COO of WigWag) at #SXSW2016

[[2]]

[1] "chrisreedfilm: RT @hammertonail: #SXSW2016 doc THE SEER: A PORTRAIT OF WENDELL BERRY gets reviewed by @chrisreedfilm 

[[3]]

[1] "iamscottrandell: RT @therevue: Take a jaunt down #MemoriesofSXSW & read the stories of @JRNelsonMusic @thegillsmusic & @TheBlancosMusic 
...
...

[[99]]

[1] "SunPowerTalent: SunPower #Clerical #Job: Supply Chain Intern (#Austin, TX) 

[[100]]

[1] "SunPowerTalent: #Finance #Job alert: General Ledger Accountant | SunPower

Every object in the list is a "sentence" from a same text. How can I count frequency of all 3-gram in this text and know which sentence is each 3-gram from?

Thanks a lot.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Paul
  • 3
  • 3

1 Answers1

0

You can use the R package textcat (https://CRAN.R-project.org/package=textcat) for this. If your list of 100 sentences is called x you simply do:

library("textcat")
n3gram <- textcat_profile_db(x, n = 3)

This is then a list of 100 elements (corresponding to the original sentences) containing the 3-grams ordered by frequency. See ?textcat_profile_db for more details and examples.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49