1

The aim is to extract the sub-tree (phrases) from the sentence if the 'nsubj' exists in the given sentence.

Here is the code which I am using:

import spacy    
nlp = spacy.load('en')
piano_doc = nlp('The alarm clock is, to many high school students, a wailing monstrosity whose purpose is to torture all who are sleep-deprived')
    for token in piano_doc:
        if token.dep_ == 'nsubj':    
            print (token.text, token.tag_, token.head.text, token.dep_)
            subtree = token.subtree
            print([(t.text) for t in subtree])
            print('*' * 50)

The output we get is: clock NN is nsubj

['The', 'alarm', 'clock']


purpose NN is nsubj

['whose', 'purpose']


who WP are nsubj

['who']


But the output i am expecting in the case of nsubj is the whole subtree i.e.


purpose NN is nsubj

['whose', 'purpose','is','to','torture']


who WP are nsubj

['who' ,'are' ,'sleep-deprived']

  • I believe your understanding of what a subtree means may be wrong. Verbs like is or are cannot be a part of the subject's subtree. A subtree is directly connected with dependencies. What exactly are you trying to extract? – krisograbek Jul 09 '21 at 03:05
  • I am trying to extract all possible phrases from a sentence. – aravind kamarsu Jul 09 '21 at 14:40

1 Answers1

1

As krisograbek mentioned, your understanding of a subtree is not what a subtree is in spaCy, or in dependency parsing in general.

In dependency parsing, if you have a subject and a verb, the verb is the head. This means the subtree of the subject does not include the verb.

I am not sure exactly what you want but maybe you should try token.head.subtree for the subject.

polm23
  • 14,456
  • 7
  • 35
  • 59
  • I get it. Its actually a mistake to take sub-tree of the nsubj word. what i am trying to achieve is if given a sentence i want to extract all possible phrases from it. One of the possible ways of doing it is through sub-tree, so that's why i was going in that way. – aravind kamarsu Jul 09 '21 at 14:43