0

Friends, We are trying work on a problem where we have a dump of only reviews but there is no rating in a .csv file. Each row in .csv is one review given by customer of a particular product, lets a TV.

Here, I wanted to do classification of that text into below pre-defined category given by the domain expert of that products:

  • Quality
  • Customer
  • Support
  • Positive Feedback
  • Price
  • Technology

Some reviews are as below:

  1. Bought this product recently, feeling a great product in the market.
  2. Was waiting for this product since long, but disappointed
  3. The built quality is not that great
  4. LED screen is picture perfect. Love this product
  5. Damm! bought this TV 2 months ago, guess what, screen showing a straight line, poor quality LED screen
  6. This has very complicated options, documentation of this TV is not so user-friendly
  7. I cannot use my smart device to connect to this TV. Simply does not work
  8. Customer support is very poor. I don't recommend this
  9. Works great. Great product

Now, with above 10 reviews by 10 different customers, how do I categorize them into the given buckets (you can call multilabel classification or Named Entity Recognition or Information extraction with sentiment analysis or be it anything)

I tried all NLP word frequency counting related stuff (in R) and referred StanfordNLP (https://nlp.stanford.edu/software/CRF-NER.shtml) and many more. But could not get a concrete solution.

Can anybody please guide me how can we tackle this problem? Thanks !!!

Abdulrahman Bres
  • 2,603
  • 1
  • 20
  • 39
Adarsha Murthy
  • 145
  • 3
  • 13
  • Can you list each bucket (category) as a separate bullet? Are there seven buckets listed in your question or just one? Which review would belong to the Price category? – Adnan S Feb 10 '18 at 04:34
  • Sure Andan 1.Quality ; 2.Customer ;3.Support ;4. Positive Feedback; 5.Price; 6.Technology. if there is no matching sentiment to the category then that category can be zero %, that ultimately mean to the product manufacturer that, he need not to worry about it. hope i answered your query – Adarsha Murthy Feb 10 '18 at 09:04

1 Answers1

0

Most NLP frameworks will handle multi-class classification. Word count by itself in R will not likely be very accurate. A python library you can explore is Spacy. Commercial APIs like Google, AWS, Microsoft can also be used. You will need quite a few examples per category for training. Feel free to post your code and the problem or performance gap you see for further help.

Adnan S
  • 1,852
  • 1
  • 14
  • 19
  • Thanks Andan and Sam.. i am following this link -- http://brandonrose.org/clustering which is matching my requirement. But yes as Andan mentioned we need to have some sample few examples per category for training. its working now.. will update the details ASAP... – Adarsha Murthy Feb 17 '18 at 06:47