Spacy Matcher weirdness

Question

I use next pattern on the explosion web site in demos "Rule-based Matcher Explorer"

pattern = [{'LEMMA': 'museum'}]

Text is

museums in madrid

And this is work, Okay. Then i do next in code:

import spacy
from spacy.matcher import Matcher


nlp = spacy.load("en_core_web_sm")

matcher = Matcher(nlp.vocab)

matcher.add("tourism", None, [{'LEMMA': 'museum'}])

doc = nlp("museums in madrid")
matches = matcher(doc)

print(matches)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]
    span = doc[start:end]
    print(match_id, string_id, start, end, span.text)

And there is no result! Funny thing is if text is without word "madrid", then it finds a match. Now can anyone explain me what the heck? And why on the web site is everything okay

score 0 · Answer 1 · answered Mar 16 '20 at 09:22

The lemmas depend on the POS tags, which may change for the same words in different contexts (especially for very short texts). Inspect the POS tags and lemmas for your sample texts to see why the patterns are or aren't matching.

For the lemmatizer, the difference between NOUN and PROPN is the difference. It's unsure whether "museums" might be a proper noun as in "The travel guide publisher announced a new guide 'Museums of Madrid'..."

Spacy Matcher weirdness

1 Answers1