1

How do you use Textacy's pos_regex_match() method to find subject-verb-object triples using their pseudo-regular-expression syntax? And yes, I'm aware of textacy.extract.subject_verb_object_triples(), but this function is very inaccurate and finds very little, so I'm attempting to build something more robust.

For the text:

text = "He recently wrote the sky is full of stars."

I'm trying:

svo_pattern = r'<DET>? <NOUN|PROPN|PRON>+ <VERB>?<ADV>*<VERB>+ <DET>? <NOUN|PROPN|PRON>+'
doc = textacy.Doc(text)
for sent in sents:
    matches = list(textacy.extract.pos_regex_matches(sent, svo_pattern))
    print(matches)

but it doesn't find anything. What's the flaw in my pattern? I've played with several variations of it, but nothing matches.

Cerin
  • 60,957
  • 96
  • 316
  • 522
  • What SVO would you expect to extract from your example sentence? – Anthony Hughes Jul 03 '17 at 15:13
  • @AnthonyHughes, `("He", "recently wrote", "the sky is full of stars")` – Cerin Jul 03 '17 at 21:37
  • 1
    I've had a play with Textacy API but think the tool is not really cut out for what your trying to achieve. subject_verb_object_triples() is really for SVO languages and strictly SVO formed sentences. I also feel that pos_regex_matches is a bit too rigid for extracting an SVO from your sentence which is actually far more complex than a simple SVO sentence. Would be very interested to see your work if you have any further examples to share. – Anthony Hughes Jul 04 '17 at 10:53
  • Potentially an alternative here based on NLTK https://github.com/klintan/py-nltk-svo – Anthony Hughes Jul 04 '17 at 10:55

0 Answers0