I'm trying to find verbs in a sentence with python for a NLP problem. I found an old answer here on stackoverflow and it works with the deprecated pos_regex_matches. Using the new matches function I have a pretty boring problem. The new function returns any match and not only the longest match (which pos_regex_matches does).
pattern = r'<VERB>*<ADV>*<VERB>+<PART>*'
verb_pattern = [{"POS": "VERB", "OP": "*"},{"POS": "ADV", "OP": "*"},{"POS": "VERB", "OP": "+"},{"POS": "PART", "OP": "*"}]
t_list_1 = textacy.extract.pos_regex_matches(text, pattern)
t_list_2 = textacy.extract.matches(text, verb_pattern)
As you can see the pattern is the same, but the matches function's one is in the new format.
The old pos_regex_matches returns, for example, was celebrating
while the new matches returns both was
and was celebrating
.
Does someone has encountered the same problem? Is a pattern problem or a textacy problem?
Thanks in advance