I want to process a bunch of text files using NLTK, splitting them on a particular keyword. I am therefore trying to "subclass StreamBackedCorpusView
, and override the read_block()
method", as suggested by the documentation.
class CustomCorpusView(StreamBackedCorpusView):
def read_block(self, stream):
block = stream.readline().split()
print("wtf")
return [] # obviously this is only for debugging
class CustomCorpusReader(PlaintextCorpusReader):
CorpusView = CustomCorpusViewer
However my knowledge of inheritance is rusty, and it seems my overriding is not taken into account. The output of
corpus = CustomCorpusReader("/path/to/files/", ".*")
print(corpus.words())
is identical to the output of
corpus = PlaintextCorpusReader("/path/to/files", ".*")
print(corpus.words())
I guess I'm missing something obvious, but what ?