I would like to give a certain match rule priority in Spacy's matcher. For example the sentence: "There is no apple or is there an apple?
, I would like to give the no apple
priority. So actually if that happens once is should return no string_id. Now I use a pattern to check both "no apple" and "apple". Here is some reproducible example:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [
[{"LOWER": {"NOT_IN": ["no"]}}, {"LOWER": "apple"}],
[{"LOWER": "apple"}]
]
matcher.add("apple", pattern)
doc = nlp("There is no apple or is there an apple?")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)
Output:
8566208034543834098 apple 3 4 apple
8566208034543834098 apple 7 9 an apple
8566208034543834098 apple 8 9 apple
Now it matches the apple multiple times because of the second statement in the pattern. An option could be creating a separate pattern especially for No like this:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [
[{"LOWER": "apple"}],
]
no_pattern = [
[{"LOWER": "no"}, {"LOWER": "apple"}],
]
matcher.add("apple", pattern)
matcher.add("no_apple", no_pattern)
doc = nlp("There is no apple or is there an apple?")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)
Output:
14541201340755442066 no_apple 2 4 no apple
8566208034543834098 apple 3 4 apple
8566208034543834098 apple 8 9 apple
Now it show the no apple as a pattern which can be used for the outcome. But I was wondering if it possible to let spacy know to prioritize a statement? This would prevent it from making multiple patterns.