I'm trying to write a custom sentence segmenter in spaCy that returns the whole document as a single sentence.
I wrote a custom pipeline component that does it using the code from here.
I can't get it to work though, because instead of changing the sentence boundaries to take the whole document as a single sentence it throws two different errors.
If I create a blank language instance and only add my custom component to the pipeline I get this error:
ValueError: Sentence boundary detection requires the dependency parse, which requires a statistical model to be installed and loaded.
If I add the parser component to the pipeline
nlp = spacy.blank('es')
parser = nlp.create_pipe('parser')
nlp.add_pipe(parser, last=True)
def custom_sbd(doc):
print("EXECUTING SBD!!!!!!!!!!!!!!!!!!!!")
doc[0].sent_start = True
for i in range(1, len(doc)):
doc[i].sent_start = False
return doc
nlp.begin_training()
nlp.add_pipe(custom_sbd, first=True)
I get the same error.
If I change the order for it to parse first and then change the sentence boundaries, the error changes to
Refusing to write to token.sent_start if its document is parsed, because this may cause inconsistent state.
So if it throws an error requiring the dependency parse if it's not present or it executes after the custom sentence boundary detection, and a different error when the dependency parse is executed first, what's the appropriate way to do it?
Thank you!