How do you use Textacy's pos_regex_match()
method to find subject-verb-object triples using their pseudo-regular-expression syntax? And yes, I'm aware of textacy.extract.subject_verb_object_triples()
, but this function is very inaccurate and finds very little, so I'm attempting to build something more robust.
For the text:
text = "He recently wrote the sky is full of stars."
I'm trying:
svo_pattern = r'<DET>? <NOUN|PROPN|PRON>+ <VERB>?<ADV>*<VERB>+ <DET>? <NOUN|PROPN|PRON>+'
doc = textacy.Doc(text)
for sent in sents:
matches = list(textacy.extract.pos_regex_matches(sent, svo_pattern))
print(matches)
but it doesn't find anything. What's the flaw in my pattern? I've played with several variations of it, but nothing matches.