Can I use spacy in python to find NP with specific neighbors? I want Noun phrases from my text that has verb before and after it.
Asked
Active
Viewed 8,084 times
8
-
Text and output example ? – DhruvPathak Jun 21 '17 at 05:51
3 Answers
13
- You can merge the noun phrases ( so that they do not get tokenized seperately).
Analyse the dependency parse tree, and see the POS of neighbouring tokens.
>>> import spacy >>> nlp = spacy.load('en') >>> sent = u'run python program run, to make this work' >>> parsed = nlp(sent) >>> list(parsed.noun_chunks) [python program] >>> for noun_phrase in list(parsed.noun_chunks): ... noun_phrase.merge(noun_phrase.root.tag_, noun_phrase.root.lemma_, noun_phrase.root.ent_type_) ... python program >>> [(token.text,token.pos_) for token in parsed] [(u'run', u'VERB'), (u'python program', u'NOUN'), (u'run', u'VERB'), (u',', u'PUNCT'), (u'to', u'PART'), (u'make', u'VERB'), (u'this', u'DET'), (u'work', u'NOUN')]
By analysing the POS of adjacent tokens, you can get your desired noun phrases.
- A better approach would be to analyse the dependency parse tree, and see the lefts and rights of the noun phrase, so that even if there is a punctuation or other POS tag between the noun phrase and verb, you can increase your search coverage

DhruvPathak
- 42,059
- 16
- 116
- 175
-
It looks good but I want to automatically fetch all the Noun Phrase that has Verbs before and after it. For one sentence, one can easily read, analyze and parse but what about a panda data frame with 5000 records and each record has one cell of text that you want to analyze. – Vivek Khetan Jun 23 '17 at 17:49
-
-
Actually, I am doing spacy for the first time and very new to NLP. In your answer you are outputting all the tokens and the pos tags attached with them. I am interested in extracting NourPhrases that has verbs before after it. – Vivek Khetan Jun 23 '17 at 18:15
-
`(u'run', u'VERB'), (u'python program', u'NOUN'), (u'run', u'VERB')` what does this line signify about "python program" ? – DhruvPathak Jun 23 '17 at 18:18
-
1isn't it all tokenized words with their POS tags. In this case, it happened to be in the order of Verb+noun+Verb. I was looking to extract all such combination from a large corpus of text. I did some reading and I think, it can be easily done by navigating the parse tree. – Vivek Khetan Jun 23 '17 at 19:30
-
1I think, that you basically got the answer to your question handed to you on a plate, and almost sounds like you fail to see it. As @DhruvPathak indicates, either you phrased your question badly and you actually mean something else, but otherwise this code looks like it does exactly what you ask for. – Igor Jul 11 '17 at 09:53
-
One key part was extracting only verb noun verb patterns, so no, he did not get the answer handed on a plate. And I am also having a similar question and do not see the answer here. – Joshua Stafford Oct 24 '18 at 14:29
-
@HuckIt The answer does lie on the plate if observed with a keen eye. You would indeed see the answer here. – DhruvPathak Oct 24 '18 at 17:26
2
From https://spacy.io/usage/linguistic-features#dependency-parse
You can use Noun chunks
.
Noun chunks are "base noun phrases" – flat phrases that have a noun as their head. You can think of noun chunks as a noun plus the words describing the noun – for example, "the lavish green grass" or "the world’s largest tech fund". To get the noun chunks in a document, simply iterate over Doc.noun_chunks
.
In:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u"Autonomous cars shift insurance liability toward manufacturers")
for chunk in doc.noun_chunks:
print(chunk.text)
Out:
Autonomous cars
insurance liability
manufacturers

aerin
- 20,607
- 28
- 102
- 140
-
2This doesn't filter noun chunks to only chunks that have verbs before and after it. – Joshua Stafford Oct 24 '18 at 14:27
1
If you want to re-tokenize using merge phrases, I prefer this (rather than noun chunks) :
import spacy
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(nlp.create_pipe('merge_noun_chunks'))
doc = nlp(u"Autonomous cars shift insurance liability toward manufacturers")
for token in doc:
print(token.text)
and the output will be :
Autonomous cars
shift
insurance liability
toward
manufacturers
I choose this way because each token has property for further process :)

Syauqi Haris
- 406
- 5
- 6