1

There are a few instances where spaCy tags an ORG instead of the GPE I am looking for. I am not sure how to 'turn off' tagging ORG so that it will only look for GPE, or if there is a way to prioritize GPE first.

import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')doc = nlp('Is there a way to bypass the ORG tag for the Los Angeles Lakers and only tag Los Angeles')
displacy.render(doc, style="ent")

In that example when 'Los Angeles Lakers' is together, it will be tagged as a ORG, but really what I want is the GPE Los Angeles. Another example is 'Seattle Seahawks' Looking for the GPE Seattle but I get the ORG

Logan
  • 11
  • 2

1 Answers1

0

No, you can't do this. The models internalize the relations of the tags to each other, so it can't just ignore one tag during inference.

This is a case of nested entities, where in some sense the right thing is that "Lost Angeles" is a GPE and "Los Angeles Lakers" is an ORG at the same time. But that's hard to model and often not even useful so most models don't do it and most datasets don't support it. So in this case the model has been trained to specifically do the thing you don't want it to do.

If you are looking for real locations that are major American cities, you might be better off just using an EntityRuler to match a big list of place names.

Your other alternative is to train a model from scratch. If it's just for GPEs that might not be too bad, but it would still be pretty involved.

polm23
  • 14,456
  • 7
  • 35
  • 59