3

I was reproducing a Spacy rule-matching example:

import spacy 
from spacy.matcher import Matcher 

nlp = spacy.load("en_core_web_md")
doc = nlp("Good morning, I'm here. I'll say good evening!!")
pattern = [{"LOWER": "good"},{"LOWER": {"IN": ["morning", "evening"]}},{"IS_PUNCT": True}] 
matcher.add("greetings", [pattern]) # good morning/evening with one pattern with the help of IN as follows
matches = matcher(doc)
for mid, start, end in matches:
    print(start, end, doc[start:end])

which is supposed to match

Good morning  good evening!

But the above code also matches "I" in both occasions

0 3 Good morning,
3 4 I
7 8 I
10 13 good evening!

I just want to remove the "I" from the Matching

Thank you

Aureon
  • 141
  • 9

1 Answers1

1

When I run your code on my machine (Windows 11 64-bit, Python 3.10.9, spaCy 3.4.4 with both the en_core_web_sm and en_core_web_trf pipelines), it produces a NameError because matcher is not defined. After defining matcher as an instantiation of the Matcher class in accordance with the spaCy Matcher documentation, I get the following (desired) output with both pipelines:

0 3 Good morning,
10 13 good evening!

The full working code is shown below. I'd suggest restarting your IDE and/or computer if you're still seeing your unexpected results.

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
doc = nlp("Good morning, I'm here. I'll say good evening!!")
matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": "good"}, {"LOWER": {"IN": ["morning", "evening"]}}, {"IS_PUNCT": True}]
matcher.add("greetings", [pattern])  # good morning/evening with one pattern with the help of IN as follows
matches = matcher(doc)
for match_id, start, end in matches:
    print(start, end, doc[start:end])
Kyle F Hartzenberg
  • 2,567
  • 3
  • 6
  • 24