I am a newbie using python. Now I am doing natural language processing for a novel, and I choose to load the book from nltk.corpus.gutenberg.fileids(). I just use 'Sense and Sensibility'. Then I want to analyze each chapter. How to split the whole book into parts? I notice that the books loaded this way has unique format. It's not like txt format.
import nltk
nltk.download('gutenberg')
nltk.corpus.gutenberg.fileids()
When I print the book out, it shows: ['[', 'Sense', 'and', 'Sensibility', 'by', 'Jane', ...]
sense = nltk.Text(nltk.corpus.gutenberg.words('austen-sense.txt'))
print(sense)
Then here is another format: <Text: Sense and Sensibility by Jane Austen 1811> I don't know what it means.
If I use another .txt book source, I also don't know how to split the chapters. I've uploaded the book into the folder, then:
text = 'senseText.txt'