Is there a way to create a corpus without having to have items in files. For instance, I want to manipulate Tweets or paragraphs that I am grabbing from the web. Can I do something like
myCorpus = MyCorpus([
('id', 'item', 'category'),
('id', 'item', 'category'),
('id', 'item', 'category'),
... ])
Or
myCorpus.add('id', 'item', 'category')
The purpose is to manipulate the corpus with existing NLTK capabilities. I checked TextCollection
but it seems that it doesn't handle categories.