-2

I am working on a text dataset containing messages from users on a website. Please check the image in the link as stack is not allowing me to post this image directly. dataframe of the first five rows

Reading those messages i want to find out the intent of the users whether they are buyer, seller or neutral. I have tried topic modelling using both LDA and NMF but it's not giving me answers. As i am getting very different topics and i cannot find a way to relate it to buyer seller or neutral. And i cannot manually label these data because it's a huge dataset containing 200,000 thousands of rows. So which technique or algorithm can i use to solve this problem.

Amir Khan
  • 1
  • 1
  • Please check [Which site?](https://meta.stackexchange.com/questions/129598/which-computer-science-programming-stack-exchange-do-i-post-in) for general issues. – Prune Oct 09 '19 at 21:02
  • 1
    The basic problem you have is one of information. With *no* input, all that a model can do for you is to group messages by derived criteria; it's quite reasonable to think that LDA would group these by level of vocabulary used, or proportion of emojis and punctuation. If you want a directed classification you *must* give the model enough guidance to begin the process; there is no "read my mind" algorithm. – Prune Oct 09 '19 at 21:05

1 Answers1

0

the algorithm you tried "LDA" (I'm not famalier with the other one) is (as you said) a topic model algorithm which isn't so helpful in this case...

What I'd suggest you to do is, try to label chunck of messages for each category-

  1. seller
  2. buyer
  3. neutral

and transform the problem your facing into a classification problem, then choose any classification algorithm to classify the messages into one of the categories...

For reference I'd suggest you to look at this problem and have some inspiration- https://towardsdatascience.com/applied-text-classification-on-email-spam-filtering-part-1-1861e1a83246

Yoel Nisanov
  • 984
  • 7
  • 16
  • Labelling manually is a time consuming task. If I don’t find any technique, I’ll have to do that itself. – Amir Khan Oct 09 '19 at 21:58
  • It obviously is a time consuming task but unfortunatley there is not fully automated techniques for the task your asking to perform ... what I suggested from what I know is the best answer youll get – Yoel Nisanov Oct 11 '19 at 09:09