Not too sure exactly how to word the problem, so thank you for indulging the title...
I'm using SpaCy's Matcher function to parse clauses (adverbial/prepositional/etc.) as a part of pre-processing. Some of these clauses are fairly complex and it would be impossible to create strict rules for every instance. Consequently, I have utilized {'OP': ''}* in my Matcher to account for the tokens that I cannot manually create rules for. My issue: is that each clause type cannot permit certain token types. I would like to create a rule within my Pattern Matcher that permits all token types, except for particular tokens that I could specify.
Simplified version of my current Matcher for Adjectival Clauses:
pattern = [{'TAG': ',', 'OP': '+'},
{'DEP': 'det', 'OP': '*'},
{'DEP': 'det', 'OP': '*'},
{'DEP': 'amod', 'OP': '+'},
{'OP': '*'},
{'TAG': '.', 'OP': '+'}]
GOAL: Maintain the core structure of the pattern while being able to exclude "ROOT" dependencies, because the inclusion of "ROOT" Dependency Tokens create false matches.
I have tried to add {'DEP': 'ROOT', 'OP': '!'} to create an exception for {'OP': ''}*. The code resultingly looks like this:
pattern = [{'TAG': ',', 'OP': '+'},
{'DEP': 'det', 'OP': '*'},
{'DEP': 'det', 'OP': '*'},
{'DEP': 'amod', 'OP': '+'},
{'OP': '*'},
{'DEP': 'ROOT', 'OP': '!'}
{'TAG': '.', 'OP': '+'}]
I expected the matcher to initially parse the unwanted token and accept it in the Matcher, then reject it once it hit the {'DEP': 'ROOT', 'OP': '!'} rule. The goal is to be able to parse the clause from sentence (1) and not parse sentence (2):
(1) "It has started a revolution, this merry band." (2) "And yes, this merry band isn’t all happy or all dudes."
As far as I'm aware, {'OP': '*'} is the only rule that will accept all tokens and {'DEP': 'ROOT', 'OP': '!'} is the only rule to negate tokens. I've tried to mix the order but that hasn't helped either.
If anyone knows of a way to utilize the {'OP': '*'} rule while also being able to restrict specific token types that would be greatly appreciated. Thank you!