How to match a SVO pattern with Textacy

Question

How do you use Textacy's pos_regex_match() method to find subject-verb-object triples using their pseudo-regular-expression syntax? And yes, I'm aware of textacy.extract.subject_verb_object_triples(), but this function is very inaccurate and finds very little, so I'm attempting to build something more robust.

For the text:

text = "He recently wrote the sky is full of stars."

I'm trying:

svo_pattern = r'<DET>? <NOUN|PROPN|PRON>+ <VERB>?<ADV>*<VERB>+ <DET>? <NOUN|PROPN|PRON>+'
doc = textacy.Doc(text)
for sent in sents:
    matches = list(textacy.extract.pos_regex_matches(sent, svo_pattern))
    print(matches)

but it doesn't find anything. What's the flaw in my pattern? I've played with several variations of it, but nothing matches.

What SVO would you expect to extract from your example sentence? — Anthony Hughes, Jul 03 '17 at 15:13
@AnthonyHughes, `("He", "recently wrote", "the sky is full of stars")` — Cerin, Jul 03 '17 at 21:37
I've had a play with Textacy API but think the tool is not really cut out for what your trying to achieve. subject_verb_object_triples() is really for SVO languages and strictly SVO formed sentences. I also feel that pos_regex_matches is a bit too rigid for extracting an SVO from your sentence which is actually far more complex than a simple SVO sentence. Would be very interested to see your work if you have any further examples to share. — Anthony Hughes, Jul 04 '17 at 10:53
Potentially an alternative here based on NLTK https://github.com/klintan/py-nltk-svo — Anthony Hughes, Jul 04 '17 at 10:55

How to match a SVO pattern with Textacy

0 Answers0