3

Is there a way to create a corpus without having to have items in files. For instance, I want to manipulate Tweets or paragraphs that I am grabbing from the web. Can I do something like

myCorpus = MyCorpus([
    ('id', 'item', 'category'), 
    ('id', 'item', 'category'),
    ('id', 'item', 'category'), 
    ... ])

Or

myCorpus.add('id', 'item', 'category')

The purpose is to manipulate the corpus with existing NLTK capabilities. I checked TextCollection but it seems that it doesn't handle categories.

Anthony Mastrean
  • 21,850
  • 21
  • 110
  • 188
user778417
  • 31
  • 3

1 Answers1

-1

Why not just write the strings out to a file or files and then process them as a corpus?

Jamie Forrest
  • 10,895
  • 6
  • 51
  • 68