How can I prioritize Rule Based Matching over trained NER Model in Spacy?

Question

I am building a Named Entity Recognition model for biomedical text (cancer papers from Pubmed). I trained a custom NER model using spacy for 3 entities (DISEASE, GENE, and DRUG) types. Further, I combined the model with rule based components to improve the accuracy of my model.

Here is my current code -


# Loaded the trained NER Model
nlp = spacy.load("my_spacy_model")

# Define entity patterns for EntityRuler (just showing 2 relevant patterns here, it contains more patterns)
patterns = [{"label": "GENE", "pattern": "BRCA1"},
            {"label": "GENE", "pattern": "BRCA2"}]

ruler = EntityRuler(nlp)

ruler.add_patterns(patterns)

nlp.add_pipe(ruler)

When I test the above code on the following piece of text -

text = "Exceptional response to olaparib in BRCA2-altered breast cancer after PD-L1 inhibitor and chemotherapy failure"

I get the following result -

DISEASE  BRCA2-altered breast cancer
DRUG  olaparib
GENE PD-L1

However, the correct answer is -

GENE BRCA2
^^^^^^^^^^^
DISEASE breast cancer
^^^^^^^^^^^^^^^^^^^^^
DRUG  olaparib
GENE PD-L1

The model is not recognizing BRCA2 as a gene, which I have added in the patterns for EntitytRuler.

Is there a way to prioritize predictions from rule-based matching over the trained model? Alternatively, is there something else I can do to get the correct results by combining rule-based matching?

score 6 · Accepted Answer · answered Aug 29 '19 at 06:46

6

You can either add the EntityRuler before the NER component in the pipeline:

nlp.add_pipe(ruler, before="ner")

Or tell the EntityRuler to overwrite existing entities:

ruler = EntityRuler(nlp, overwrite_ents=True)

The NER predictions might be slightly different in each case, because in the first option, the model's predictions might change given the presence of existing entity spans.

answered Aug 29 '19 at 06:46

aab

10,858
22
38

I already tried adding EntityRuler before the NER Component. That helps for the particular case that I have shared in my Question, however, the updated model is not able to tag entities which it was able to identify earlier. It learns the rule-based entities that I supply but forgets many entities that it was tagging previously. How can I overcome this issue? – iCHAIT Aug 29 '19 at 06:55
I tried using `overwrite_ents = True` and it gave me the following results - DISEASE breast cancer DRUG olaparib GENE PD-L1 It is not able to recognize `BRCA2` as a Gene. I think that is because in the given sentence BRCA2-altered is a word and I don't have a rule for that. Can you explain a bit about what `overwrite_ents = True` is doing? I read the relevant [doc](https://spacy.io/api/entityruler#init), but couldn't undertstand what it is doing. – iCHAIT Aug 29 '19 at 07:22
1

Then I think `overwrite_ents = True` is the better option for you. Ines gives a good explanation here: https://github.com/explosion/spaCy/issues/3775 – aab Aug 29 '19 at 07:24
1

If any part of the EntityRuler entity overlaps with an existing entity (even partially), the EntityRuler removes the existing entity and adds the new EntityRuler one. – aab Aug 29 '19 at 07:26
Thanks for the link and the explanation. I have one follow up question though, using `overwrite_ents = True` it gives me the following results - ``` DISEASE breast cancer DRUG olaparib GENE PD-L1 ``` It is not able to recognize `BRCA2` as a gene. I think that is because in the given text `BRCA2-altered` acts as one word. And since I don't have that in my patterns, it does not tag it. How can I work around that? – iCHAIT Aug 29 '19 at 07:28
I obviously can't add `BRCA2-altered` to my patterns list since that is not a valid GENE. How can I extract and tag BRCA2 here? – iCHAIT Aug 29 '19 at 07:36
1

To do this cleanly in spacy you would have the change the tokenization. Changing the tokenizer to split on every `-` might cause headaches elsewhere, though. Here's how to customize infixes in the tokenizer: https://stackoverflow.com/a/57304882/461847 – aab Aug 29 '19 at 12:58

How can I prioritize Rule Based Matching over trained NER Model in Spacy?

1 Answers1