Get category from text or keywords

Question

I managed so far to cluster and identify "trending topics" from tweets using 3 different approaches (LDA, SVD and k-means) with k=12. The problem now is to give a category to these topics.

I used Alchemy API for text categorization. However, I am only getting the recreation category as response foreach topic. I think this problem is due to the fact that tweets are full of noise and slang words(I've already done data cleansing and pre-processing though). I would like to know if there is any NLP library or statistical algorithm that is capable of classifying documents to a specific category(getting a category out of a text or a set of keywords).

score 0 · Answer 1 · answered Aug 06 '15 at 20:35

Sure I know the Carrot project check it here:

http://project.carrot2.org/

Behind scenes is an algorithm which also infers category naming. If you want algorithm details you can find it here:

http://project.carrot2.org/publications/osinski-2003-lingo.pdf

Basically it uses LSI with SVD and then something for Cluster Label Induction. Hope it helps,

Get category from text or keywords

1 Answers1