The following link shows how to add custom entity rule where the entities span more than one token. The code to do that is below:
import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', parse=True, tag=True, entity=True)
animal = ["cat", "dog", "artic fox"]
ruler = EntityRuler(nlp)
for a in animal:
ruler.add_patterns([{"label": "animal", "pattern": a}])
nlp.add_pipe(ruler)
doc = nlp("There is no cat in the house and no artic fox in the basement")
with doc.retokenize() as retokenizer:
for ent in doc.ents:
retokenizer.merge(doc[ent.start:ent.end])
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern =[{'lower': 'no'},{'ENT_TYPE': {'REGEX': 'animal', 'OP': '+'}}]
matcher.add('negated animal', None, pattern)
matches = matcher(doc)
for match_id, start, end in matches:
span = doc[start:end]
print(span)
I tried but i got the error bellow:
If you created your component with
nlp.create_pipe('name')
: remove nlp.create_pipe and callnlp.add_pipe('name')
instead.If you passed in a component like
TextCategorizer()
: callnlp.add_pipe
with the string name instead, e.g.nlp.add_pipe('textcat')
.If you're using a custom component: Add the decorator
@Language.component
(for function components) or@Language.factory
(for class components / factories) to your custom component and assign it a name, e.g.@Language.component('your_name')
. You can then runnlp.add_pipe('your_name')
to add it to the pipeline.
How can I fixed please? NB: spaCy version 3.0.6