Identifying interest / topic from text

Question

I am attempting to build a model that will attempt to identify the interest category / topic of supplied text. For example:

Shop for Bridal Wedding Sarees from our exhausting variety of beautiful and designer sarees. Get great deals, quality stitching and Free International delivery.

would resolve to a top level category like:

Fashion or Wedding Fashion

To acheive this, I have used Latent Dirichlet allocation (LDA) which is a topic model that generates topics based on word frequency from a set of documents.

So I got topics of document as below but don't find way to map them to human understandable format

topic #0 (0.500): 0.100*sare + 0.060*intern + 0.060*get + 0.060*deal + 0.060*exhaust + 0.060*design + 0.060*free + 0.060*qualiti + 0.060*shop + 0.060*great

topic #1 (0.500): 0.063*sare + 0.063*beauti + 0.063*deliveri + 0.063*stitch + 0.063*varieti + 0.063*wed + 0.062*bridal + 0.062*great + 0.062*shop + 0.062*qualiti

I have used this script to implement above things.

So the Question is How to map above identified topics to human readable category like Fashion?

Um, ok, so if you've done all this, what is keeping your from just extracting the top-ranked keyword(s) from each result? What do you actually want to see as your "human readable category" for the two examples you give? — alexis, Oct 03 '16 at 12:22
Given that neither "Fashion" nor "Wedding Fashion" is part of the text, I understand you are actually looking to categorize the text rather than extract a chunk that represents the text. I see this more of a classification problem rather than a topic modelling one. — bogs, Oct 03 '16 at 14:29
@bogs You are right. I want to categorize given text. For example, Sports, Technology, Fashion, Food, Travel etc.. Would you suggest which classification algorithm help to resolve such problem in python / nltk? — GBD, Oct 04 '16 at 06:58
@alexis I want to categorize the text. If there are talks about food in text, then i want to tag that text as Food. If there are talks about sports in text, then i want to tag that text as Sports. same way Travel, Technology etc.. — GBD, Oct 04 '16 at 07:01
You'll need to train a classification model on the output of you LDA model to predict the set of tags you are interested in. — aberger, Oct 04 '16 at 15:42
There may be pre-built models out there, but if you want a specific set of tags then you'll have to train yourself. — aberger, Oct 04 '16 at 15:45
If you can figure out the set of "human readable" categories you want to see, you could train a classifier using the weighted terms as features. But you'd need a (substantial) training corpus that you've already annotated with the desired categories. — alexis, Oct 04 '16 at 17:40
@GBD I just said that there may be models out there, but I have no idea is there are or what they're called. They would also be trained on different data and would therefore not work as well for you as a model trained on your data! — aberger, Oct 05 '16 at 14:26

Identifying interest / topic from text

0 Answers0