1

I using Spacy for custom sentence spliting and i need to parametrized the custom_delimeter/word for sentence spiting but i didnt find how to pass as an arugument here is the function,

# Manual or Custom Based
def mycustom_boundary(docx):
    for token in docx[:-1]:
        if token.text == '...':
            docx[token.i+1].is_sent_start = True
    return docx

# Adding the rule before parsing
nlp.add_pipe(mycustom_boundary,before='parser')

Please let me know how can i send as a argument custom based splitter as list to function?

Manoj
  • 61
  • 1
  • 2
  • 8

1 Answers1

1

You could turn your component into a class that can be initialized with a list of delimiters? For example:

class MyCustomBoundary(object):
    def __init__(self, delimiters):
        self.delimiters = delimiters

    def __call__(self, doc):  # this is applied when you call it on a Doc
        for token in doc[:-1]:
            if token.text in self.delimiters:
                doc[token.i+1].is_sent_start = True
        return doc

You can then add it to your pipeline like this:

mycustom_boundary = MyCustomBoundary(delimiters=['...', '---'])
nlp.add_pipe(mycustom_boundary, before='parser')
Ines Montani
  • 6,935
  • 3
  • 38
  • 53