I am trying to perform sentiment analysis on a large number of product reviews using CoreNLP (Java). Overall, I find the accuracy of the analysis to be pretty good. From what I read, the model I'm using was initially created using movie reviews (I think), so it's not 100% suited for analyzing product reviews. I was wondering the best way to go about "enhancing" the accuracy of my analysis.
The main thing I was thinking about was that in addition to the text of the product review, I also have a user-provided star rating. The values range from 1-5, 1 star being the lowest. I was hoping there was a way to take the star rating into account when generating the sentiment score, since it more accurately reflects the users' feelings on a particular product. Is there a way I can best have the star rating factor in to the sentiment analysis scoring in CoreNLP? My analysis code looks something like this:
List<ProductReview> reviews = this.reviewRepository.findAll();
for (ProductReview review : reviews) {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, sentiment");
props.put("ner.model", "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
int starRating = review.getStarRating();
String reviewText = review.getTitle() + " : " + review.getReviewText();
if (!StringUtils.isEmpty(reviewText)) {
int longest = 0;
int mainSentiment = 0;
Annotation annotation = pipeline.process(reviewText);
String sentimentStr = null;
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
Tree sentimentTree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
int sentiment = RNNCoreAnnotations.getPredictedClass(sentimentTree) - 2;
String partText = sentence.toString();
if (partText.length() > longest) {
mainSentiment = sentiment;
sentimentStr = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
longest = partText.length();
}
}
}
}
How could I best incorporate the star ratings (or other info, such as votes on the most useful product reviews, etc.) into the analysis being performed by CoreNLP? Is this something I would have to do separately? Or is there a way to incorporate the additional data directly into the sentiment analysis engine?