0

I'm working on a reviews dataset. The problem is to fetch the important(number of times the same feature reviewed) positive and negative features of that specific product from the reviews.

Ex: some xyz car

positive: Great mileage, good looking, spacious etc

Negative: Poor power, bad performance, software problems etc

Thing is to extract the best and worst things about the product!

Until now I've used gensim's doc2vec to find the top positive and negative sentence. The results are not so good and because it gets similar sentences with structure, not similar feathers it holds.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
vijay athithya
  • 1,529
  • 1
  • 10
  • 16

2 Answers2

2

Some write-ups of the "Word Mover's Distance" calculation, for identifying similar sentences/phrases, use reviews as their dataset and seem to extract common themes and representative phrases well.

See for example:

"Navigating themes in restaurant reviews with Word Mover’s Distance" http://tech.opentable.com/2015/08/11/navigating-themes-in-restaurant-reviews-with-word-movers-distance/

"Finding similar documents with Word2Vec and WMD" https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • thanks mate!! After getting the results, how can we take out the features! Ex: Car looks stunning, gives good mileage on highways. Need to extract ''Look good & mileage good''. – vijay athithya Dec 30 '18 at 06:11
  • 1
    It seems you want to convert things to a more simple expression. From the examples I linked, I'd guess that once you have a cluster of phrases saying similar things, you could pick something like (1) the most commonly repeated; (2) the most central to all others; (3) the ones using the most-common or fewest words – or some combination of the three – to represent a cluster in a simple, perhaps-canonical way. – gojomo Dec 30 '18 at 06:18
1

It look like you want to extract features about a product, which is most commonly spoken in your reviews. This is typical topic clustering problem. You could use Latent Dirichlet Allocation model to do topic clustering.

This approach would give you the features, then you can run the sentiment analysis model to know the positive or negative sentiment towards that feature.

By chance, if you know of the features already and you want to group into some clusters then look at this Q&A and the mentioned paper in the question.

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77