You're right that it's quite hard to find the documentation for the book.py
module. So we have to get our hands dirty and look at the code, (see here). Looking at the book.py
, to do the conoordance and all the fancy stuff with the book module:
Firstly you have to have your raw texts put into nltk's corpus
class, see Creating a new corpus with NLTK for more details.
Secondly you read the corpus words into the NLTK's Text
class. Then you could use the functions that you see in http://nltk.org/book/ch01.html
from nltk.corpus import PlaintextCorpusReader
from nltk.text import Text
# For example, I create an example text file
text1 = '''
This is a story about a foo bar. Foo likes to go to the bar and his last name is also bar. At home, he kept a lot of gold chocolate bars.
'''
text2 = '''
One day, foo went to the bar in his neighborhood and was shot down by a sheep, a blah blah black sheep.
'''
# Creating the corpus
corpusdir = './mycorpus/'
with (corpusdir+'text1.txt','w') as fout:
fout.write(text1)
with (corpusdir+'text2.txt','w') as fout:
fout.write(text2, fout)
# Read the the example corpus into NLTK's corpus class.
mycorpus = PlaintextCorpusReader(corpusdir, '.*')
# Read the NLTK's corpus into NLTK's text class,
# where your book-like concoordance search is available
mytext = Text(mycorpus.words())
mytext.concoordance('foo')
NOTE: you can use other NLTK's CorpusReaders and even specify custom paragraph/sentence/word tokenizers and encoding but now, we'll stick to the default