0

I would like to give a certain match rule priority in Spacy's matcher. For example the sentence: "There is no apple or is there an apple?, I would like to give the no apple priority. So actually if that happens once is should return no string_id. Now I use a pattern to check both "no apple" and "apple". Here is some reproducible example:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

pattern = [
    [{"LOWER": {"NOT_IN": ["no"]}}, {"LOWER": "apple"}],
    [{"LOWER": "apple"}]
]

matcher.add("apple", pattern)

doc = nlp("There is no apple or is there an apple?")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end] 
    print(match_id, string_id, start, end, span.text)

Output:

8566208034543834098 apple 3 4 apple
8566208034543834098 apple 7 9 an apple
8566208034543834098 apple 8 9 apple

Now it matches the apple multiple times because of the second statement in the pattern. An option could be creating a separate pattern especially for No like this:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
 
pattern = [
    [{"LOWER": "apple"}],
]
no_pattern = [
        [{"LOWER": "no"}, {"LOWER": "apple"}],
]

matcher.add("apple", pattern)
matcher.add("no_apple", no_pattern)

doc = nlp("There is no apple or is there an apple?")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end] 
    print(match_id, string_id, start, end, span.text)

Output:

14541201340755442066 no_apple 2 4 no apple
8566208034543834098 apple 3 4 apple
8566208034543834098 apple 8 9 apple

Now it show the no apple as a pattern which can be used for the outcome. But I was wondering if it possible to let spacy know to prioritize a statement? This would prevent it from making multiple patterns.

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • this may help [How to avoid double-extracting of overlapping patterns in SpaCy with Matcher?](https://stackoverflow.com/questions/63302027/how-to-avoid-double-extracting-of-overlapping-patterns-in-spacy-with-matcher) – deadshot Jun 28 '23 at 07:51
  • Hi @deadshot, Thank you for your suggestion! Would you be able to give answer? Thanks in advance! – Quinten Jun 28 '23 at 10:50

0 Answers0