Unlabeled text data containing messages

Question

I am working on a text dataset containing messages from users on a website. Please check the image in the link as stack is not allowing me to post this image directly. dataframe of the first five rows

Reading those messages i want to find out the intent of the users whether they are buyer, seller or neutral. I have tried topic modelling using both LDA and NMF but it's not giving me answers. As i am getting very different topics and i cannot find a way to relate it to buyer seller or neutral. And i cannot manually label these data because it's a huge dataset containing 200,000 thousands of rows. So which technique or algorithm can i use to solve this problem.

Please check [Which site?](https://meta.stackexchange.com/questions/129598/which-computer-science-programming-stack-exchange-do-i-post-in) for general issues. — Prune, Oct 09 '19 at 21:02
The basic problem you have is one of information. With *no* input, all that a model can do for you is to group messages by derived criteria; it's quite reasonable to think that LDA would group these by level of vocabulary used, or proportion of emojis and punctuation. If you want a directed classification you *must* give the model enough guidance to begin the process; there is no "read my mind" algorithm. — Prune, Oct 09 '19 at 21:05

score 0 · Answer 1 · answered Oct 09 '19 at 21:02

0

the algorithm you tried "LDA" (I'm not famalier with the other one) is (as you said) a topic model algorithm which isn't so helpful in this case...

What I'd suggest you to do is, try to label chunck of messages for each category-

seller
buyer
neutral

and transform the problem your facing into a classification problem, then choose any classification algorithm to classify the messages into one of the categories...

For reference I'd suggest you to look at this problem and have some inspiration- https://towardsdatascience.com/applied-text-classification-on-email-spam-filtering-part-1-1861e1a83246

answered Oct 09 '19 at 21:02

Yoel Nisanov

984
7
16

Labelling manually is a time consuming task. If I don’t find any technique, I’ll have to do that itself. – Amir Khan Oct 09 '19 at 21:58
It obviously is a time consuming task but unfortunatley there is not fully automated techniques for the task your asking to perform ... what I suggested from what I know is the best answer youll get – Yoel Nisanov Oct 11 '19 at 09:09

Unlabeled text data containing messages

1 Answers1