"Enhancing" CoreNLP Sentiment Analysis Results

Question

I am trying to perform sentiment analysis on a large number of product reviews using CoreNLP (Java). Overall, I find the accuracy of the analysis to be pretty good. From what I read, the model I'm using was initially created using movie reviews (I think), so it's not 100% suited for analyzing product reviews. I was wondering the best way to go about "enhancing" the accuracy of my analysis.

The main thing I was thinking about was that in addition to the text of the product review, I also have a user-provided star rating. The values range from 1-5, 1 star being the lowest. I was hoping there was a way to take the star rating into account when generating the sentiment score, since it more accurately reflects the users' feelings on a particular product. Is there a way I can best have the star rating factor in to the sentiment analysis scoring in CoreNLP? My analysis code looks something like this:

List<ProductReview> reviews = this.reviewRepository.findAll();
        for (ProductReview review : reviews) {
            Properties props = new Properties();
            props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, sentiment");
            props.put("ner.model", "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz");

            StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

            int starRating = review.getStarRating();
            String reviewText = review.getTitle() + " : " + review.getReviewText();
            if (!StringUtils.isEmpty(reviewText)) {
                int longest = 0;
                int mainSentiment = 0;
                Annotation annotation = pipeline.process(reviewText);
                String sentimentStr = null;
                List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
                for (CoreMap sentence : sentences) {
                    Tree sentimentTree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);

                    int sentiment = RNNCoreAnnotations.getPredictedClass(sentimentTree) - 2;
                    String partText = sentence.toString();
                    if (partText.length() > longest) {
                        mainSentiment = sentiment;
                        sentimentStr = sentence.get(SentimentCoreAnnotations.SentimentClass.class);

                        longest = partText.length();
                    }
                }
            }
        }

How could I best incorporate the star ratings (or other info, such as votes on the most useful product reviews, etc.) into the analysis being performed by CoreNLP? Is this something I would have to do separately? Or is there a way to incorporate the additional data directly into the sentiment analysis engine?

DhruvPathak · Accepted Answer · 2017-06-19T17:56:19.940

There are a few enhancements possible.

/1. Improvised training set and contextual sentiment analysis: Some features might get classified as positive in a movie review context, but might be negative in product review context. You shall retrain your data on your context. Method specified here

Models can be retrained using the following command using the PTB format dataset:

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz

A good discussion on training dataset can be found here.

/2.Getting the contextual training and testing data : Your product reviews data can act as training set as well as testing set. Select the reviews with extreme polarities ( 1 star POOREST, and 5 star GREAT ) as your training data, to improvide further on the content, you can select 1 and 5 star reviews which have been marked as helpful by the community. Using this data generated your PTB dataset classifying the reviews as POSITIVE and NEGATIVE ( Neutral would be a hard thing to achieve by using 2-3-4 star rated reviews, as they can introduce noise ).

/3. Use 80% of your dataset as training set, and 20% as testing set. The 1 star rated reviews shall mostly get classified as NEGATIVE and 5 star shall mostly get classified as positive. Post this, you can use the trained model to analyze sentiment of other reviews, your sentiment score ( say 0 for negative sentiment, and 5 for very positive sentiment, or -1 for negative to +1 for very positive) will have a positive correlation with actual star rating provided along with that review. If there is a sentiment disparity, e.g. a text review comes out as having positive sentiment, but has 1 star rating, you may want to log such cases, and improvise your classification.

/4. Improvising using other data sources and classifiers: Vader sentiment (in python ) is a very good classifier specially attuned for social media and things like product reviews. You may or may not chose to use it as a comparative classifier ( to cross match or have double set of your results, from corenlp+vader), but you can surely use its amazon reviews dataset as mentioned here:

amazonReviewSnippets_GroundTruth.txt FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TEXT-SNIPPET

DESCRIPTION: includes 3,708 sentence-level snippets from 309 customer reviews on 5 different products. The reviews were originally used in Hu & Liu (2004); we added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'amazonReviewSnippets_anonDataRatings.txt' (described below).

amazonReviewSnippets_anonDataRatings.txt FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

The datasets are available in the tgz file here: https://github.com/cjhutto/vaderSentiment/blob/master/additional_resources/hutto_ICWSM_2014.tar.gz

It follows the pattern reviewindex_part polarity review_snippet

1_19    -0.65   the button was probably accidentally pushed to cause the black screen in the first place.
1_20    2.85    but, if you're looking for my opinion of the apex dvd player, i love it!
1_21    1.75    it practically plays almost everything you give it.

"Enhancing" CoreNLP Sentiment Analysis Results

1 Answers1