0

I managed so far to cluster and identify "trending topics" from tweets using 3 different approaches (LDA, SVD and k-means) with k=12. The problem now is to give a category to these topics.

I used Alchemy API for text categorization. However, I am only getting the recreation category as response foreach topic. I think this problem is due to the fact that tweets are full of noise and slang words(I've already done data cleansing and pre-processing though). I would like to know if there is any NLP library or statistical algorithm that is capable of classifying documents to a specific category(getting a category out of a text or a set of keywords).

Chthonic Project
  • 8,216
  • 1
  • 43
  • 92

1 Answers1

0

Sure I know the Carrot project check it here:

http://project.carrot2.org/

Behind scenes is an algorithm which also infers category naming. If you want algorithm details you can find it here:

http://project.carrot2.org/publications/osinski-2003-lingo.pdf

Basically it uses LSI with SVD and then something for Cluster Label Induction. Hope it helps,

Dr VComas
  • 735
  • 7
  • 22