1

i got triples using the following code, but i want to get nouns and adjective from tripples, i tried alot but failed, new to NLTK and python, any help ?

from nltk.parse.stanford import StanfordDependencyParser
dp_prsr= StanfordDependencyParser('C:\Python34\stanford-parser-full-2015-04-20\stanford-parser.jar','C:\Python34\stanford-parser-full-2015-04-20\stanford-parser-3.5.2-models.jar', model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
word=[]
s='bit is good university'
sentence = dp_prsr.raw_parse(s)
for line in sentence:
    print(list(line.triples()))

[(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]

i want to get university and good and BIT and universityi tried the following but couldnt work.

   for line in sentence:
    if (list(line.triples)).__contains__()  == 'JJ':
       word.append(list(line.triples()))
   print(word)

but i get empty array... please any help.

alvas
  • 115,346
  • 109
  • 446
  • 738
nizam uddin
  • 341
  • 2
  • 6
  • 15

1 Answers1

2

Linguistically

What you're looking out for when you look for triplets that contains a JJ and an NN is usually a Noun phrase NP in a context-free grammar.

In dependency grammar, what you're looking for is a triplet that contains the the JJ and NN POS tags in the arguments. Most specifically, when you're for a constituent / branch that contains an adjectival modified Noun. From the StanfordDepdencyParser output, you need to look for the predicate amod. (If you're confused with what's explained above it is advisable to read up on Dependency grammar before proceeding, see https://en.wikipedia.org/wiki/Dependency_grammar.

Note that the parser outputs the triplets, (arg1, pred, arg2), where the argument 2 (arg2) depends on argument 1 (arg1) through the predicate (pred) relation; i.e. arg1 governs arg2 (see, https://en.wikipedia.org/wiki/Government_(linguistics))


Pythonically

Now to the code part of the answer. You want to iterate through a list of tuples (i.e. triplets) so the easiest solution is to specifically assign variables to the tuples as you iterate, then check for the conditions you need see Find an element in a list of tuples

>>> x = [(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]
>>> for arg1, pred, arg2 in x:
...     word1, pos1 = arg1
...     word2, pos2 = arg2
...     if pos1.startswith('NN') and pos2.startswith('JJ') and pred == 'amod':
...             print ((arg1, pred, arg2))
... 
(('university', 'NN'), 'amod', ('good', 'JJ'))
Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
  • great (Y), this is what i wanted... it worked, but i saw this can be done by using Tregex, will it be better to do it by tregex ? if yes, any code help for the above example ? thanks in advance. – nizam uddin Dec 12 '15 at 10:34
  • Possibly but I don't use Tregex, maybe asking another question for someone else to answer might get you better experts on that package =) If not, do post to https://groups.google.com/forum/?utm_source=digest&utm_medium=email#!forum/nltk-users and see whether you get some help there. – alvas Dec 12 '15 at 10:54