ValueError: nlp.add_pipe now takes the string name of the registered component factory, not a callable component

Question

The following link shows how to add custom entity rule where the entities span more than one token. The code to do that is below:

import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', parse=True, tag=True, entity=True)

animal = ["cat", "dog", "artic fox"]
ruler = EntityRuler(nlp)
for a in animal:
    ruler.add_patterns([{"label": "animal", "pattern": a}])
nlp.add_pipe(ruler)


doc = nlp("There is no cat in the house and no artic fox in the basement")

with doc.retokenize() as retokenizer:
    for ent in doc.ents:
        retokenizer.merge(doc[ent.start:ent.end])


from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern =[{'lower': 'no'},{'ENT_TYPE': {'REGEX': 'animal', 'OP': '+'}}]
matcher.add('negated animal', None, pattern)
matches = matcher(doc)


for match_id, start, end in matches:
    span = doc[start:end]
    print(span)

I tried but i got the error bellow:

If you created your component with nlp.create_pipe('name'): remove nlp.create_pipe and call nlp.add_pipe('name') instead.
If you passed in a component like TextCategorizer(): call nlp.add_pipe with the string name instead, e.g. nlp.add_pipe('textcat').
If you're using a custom component: Add the decorator @Language.component (for function components) or @Language.factory (for class components / factories) to your custom component and assign it a name, e.g. @Language.component('your_name'). You can then run nlp.add_pipe('your_name') to add it to the pipeline.

How can I fixed please? NB: spaCy version 3.0.6

As a note, you got this error because the question you refer to was for spaCy 2, but you're using spaCy 3. Also the error message you copy pasted here tells you how to fix it, did you try following the instructions? — polm23, Jun 10 '21 at 03:39

score 15 · Accepted Answer · answered Jun 10 '21 at 07:41

15

For spaCy v2, the normal way to add an entity ruler looked like this:

ruler = EntityRuler(nlp)
nlp.add_pipe(ruler)
ruler.add_patterns(...)

For spaCy v3, you just want to add it with its string name and skip instantiating the class separately:

ruler = nlp.add_pipe("entity_ruler")
ruler.add_patterns(...)

See: https://spacy.io/usage/v3#migrating-add-pipe

answered Jun 10 '21 at 07:41

aab

10,858
22
38

how do I add the extra args of entityruler? – Wang Jul 31 '21 at 16:45
You pass a dict of config values as `config`: https://spacy.io/usage/v3#migrating-configure-pipe – aab Aug 01 '21 at 08:42

Wiktor Stribiżew · Answer 2 · 2021-06-09T17:49:55.080

4

You need to define your own method to instantiate the entity ruler:

def get_ent_ruler(nlp, name):
    ruler = EntityRuler(nlp)
    for a in animal:
        ruler.add_patterns([{"label": "animal", "pattern": a}])
    return ruler

Then, you may use it the following way:

from spacy.language import Language
Language.factory("ent_ruler", func=get_ent_ruler)
nlp.add_pipe("ent_ruler", last=True)

Also, note the pattern you wrote is not valid. I think you can fix it like this:

pattern =[{'lower': 'no'},{'ENT_TYPE': 'animal'}]

Then, the result is

no cat
no artic fox

edited Jun 09 '21 at 17:49

answered Jun 09 '21 at 17:38

Wiktor Stribiżew

607,720
39
448
563

This works in a short script, but causes a lot of unnecessary headaches if you want to save and reload the model. If you're using a built-in component like `entity_ruler` that already has a factory, it's better to just use that factory name with `nlp.add_pipe("entity_ruler")`. – aab Jun 10 '21 at 07:40

score 2 · Answer 3 · edited Feb 04 '22 at 13:31

2

For spacy 3.0+, your code should be changed as the following:

import spacy
import re
from spacy.language import Language

nlp = spacy.load('en_core_web_sm')
boundary = re.compile('^[0-9]$')

@Language.component("component")
def custom_seg(doc):
    prev = doc[0].text
    length = len(doc)
    for index, token in enumerate(doc):
        if (token.text == '.' and boundary.match(prev) and index!=(length - 1)):
            doc[index+1].sent_start = False
        prev = token.text
    return doc
    
nlp.add_pipe("component", before='parser')

edited Feb 04 '22 at 13:31

Park

2,446
1
16
25

answered Feb 04 '22 at 09:10

Zhiwei Yang

21
1

text = u'This is first sentence.\nNext is numbered list.\n1. Hello World!\n2. Hello World2!\n3. Hello World!' doc = nlp(text) for sentence in doc.sents: print(sentence.text) – Zhiwei Yang Feb 04 '22 at 09:11

ValueError: nlp.add_pipe now takes the string name of the registered component factory, not a callable component

3 Answers3

Linked