2

I'm currently working on an aspect-level sentiment analysis project using online travel reviews.

I'm using Stanford CoreNLP to get things done. So far, I have managed to pre-process the data by POS tagging and lemmatizing the review content.

I read several papers related to sentiment analysis and it looks like the next step is to extract aspect terms from the review text, along with their sentiment polarity.I have seen a video tutorial in Python NLTK where regex were used to find relationships between POS tagged words to find Noun phrases etc. I want to do the same using Stanford Dependency parser.

Unfortunately, I do not understand how to use the output of Stanford Dependency Parser to write such rules to identify aspect terms.

For two days, I have looked for a sample Java code that would explain how exactly I could accomplish this task. But so far, no luck.

Would really appreciate if someone could point me to a tutorial/sample code where I could take a look and understand the procedure.

Say I have an output similar to following;

(ROOT
  (S
    (NP (PRP It))
    (VP (VBZ is) (RB not)
      (NP
        (NP (DT a) (NN museum))
        (PP (CC but)
          (NP
            (NP (DT a) (VBG living) (JJ historic) (NN town))
            (PP (IN with)
              (NP (JJ wonderful) (NNS places)))
            (S
              (VP (TO to)
                (VP
                  (VP (VB eat)
                    (NP (NN drink)))
                  (CC and)
                  (VP (VB do)
                    (NP (NN shopping))))))))))

How can I extract museum, eat, drink, shopping as aspects?

Any help is greatly appreciated.

Mahesh De Silva
  • 505
  • 8
  • 20
  • I take that you already checked the documentation on the parser itself? You can give the options -outputFormat typedDependencies or -outputFormat typedDependenciesCollapsed to get typed dependencies (or grammatical relations) output (for English and Chinese only, currently). You can print out lexicalized trees (head words and tags at each phrasal node with the -outputFormatOptions lexicalize option. You can see all the other options by looking in the Javadoc of the TreePrint class. Also, looks like Manning himself answered similar question already stackoverflow.com/questions/11832490 – matcheek Feb 15 '16 at 16:17
  • Or http://stackoverflow.com/questions/27291367/extract-all-noun-phrases-from-stanford-parser-output-textfile-using-bash – matcheek Feb 15 '16 at 16:22

0 Answers0