0

I am attempting to build a model that will attempt to identify the interest category / topic of supplied text. For example:

Shop for Bridal Wedding Sarees from our exhausting variety of beautiful and designer sarees. Get great deals, quality stitching and Free International delivery.

would resolve to a top level category like:

Fashion or Wedding Fashion

To acheive this, I have used Latent Dirichlet allocation (LDA) which is a topic model that generates topics based on word frequency from a set of documents.

So I got topics of document as below but don't find way to map them to human understandable format

topic #0 (0.500): 0.100*sare + 0.060*intern + 0.060*get + 0.060*deal + 0.060*exhaust + 0.060*design + 0.060*free + 0.060*qualiti + 0.060*shop + 0.060*great

topic #1 (0.500): 0.063*sare + 0.063*beauti + 0.063*deliveri + 0.063*stitch + 0.063*varieti + 0.063*wed + 0.062*bridal + 0.062*great + 0.062*shop + 0.062*qualiti

I have used this script to implement above things.

So the Question is How to map above identified topics to human readable category like Fashion?

GBD
  • 15,847
  • 2
  • 46
  • 50
  • 1
    Did you forget to ask your question? – alexis Oct 01 '16 at 17:07
  • @alexis Please check. edited my question. – GBD Oct 03 '16 at 04:40
  • Um, ok, so if you've done all this, what is keeping your from just extracting the top-ranked keyword(s) from each result? What do you actually want to see as your "human readable category" for the two examples you give? – alexis Oct 03 '16 at 12:22
  • Given that neither "Fashion" nor "Wedding Fashion" is part of the text, I understand you are actually looking to categorize the text rather than extract a chunk that represents the text. I see this more of a classification problem rather than a topic modelling one. – bogs Oct 03 '16 at 14:29
  • @bogs You are right. I want to categorize given text. For example, Sports, Technology, Fashion, Food, Travel etc.. Would you suggest which classification algorithm help to resolve such problem in python / nltk? – GBD Oct 04 '16 at 06:58
  • @alexis I want to categorize the text. If there are talks about food in text, then i want to tag that text as Food. If there are talks about sports in text, then i want to tag that text as Sports. same way Travel, Technology etc.. – GBD Oct 04 '16 at 07:01
  • You'll need to train a classification model on the output of you LDA model to predict the set of tags you are interested in. – aberger Oct 04 '16 at 15:42
  • There may be pre-built models out there, but if you want a specific set of tags then you'll have to train yourself. – aberger Oct 04 '16 at 15:45
  • If you can figure out the set of "human readable" categories you want to see, you could train a classifier using the weighted terms as features. But you'd need a (substantial) training corpus that you've already annotated with the desired categories. – alexis Oct 04 '16 at 17:40
  • @aberger Can you tell me name of pre-built models? – GBD Oct 05 '16 at 04:45
  • @GBD I just said that there may be models out there, but I have no idea is there are or what they're called. They would also be trained on different data and would therefore not work as well for you as a model trained on your data! – aberger Oct 05 '16 at 14:26

0 Answers0