-1

I'm new to NLP/text processing

and building an application which requires generating topics (Music, Games, Romance, History etc etc.) from about 2 lines of imput text.

I've decided to use wikipedia's articlebase to help me out in this process,

What would be steps to "train" my program to recognize and categorize these topics from my input text?

wolfgang
  • 7,281
  • 12
  • 44
  • 72
  • Where does Wikipedia come into the picture? To train anything, you need input which is already categorized according to your criteria, which (by any stretch of imagination) a raw dump of Wikipedia text is not. – tripleee Apr 10 '15 at 04:45
  • But this is much too broad to be answered by anything less than an introductory textbook. Nominating to close. – tripleee Apr 10 '15 at 04:47

1 Answers1

1

Such a broad question. For automated topic modeling (where you don't have to train a model) you might want to look at Latent Dirichlet allocation. In python, gensim is a nice way to do LDA. I've used Weka in Java for classification tasks, which might be more what you're looking at. And LightSide Researcher's work bench offers a GUI for text mining tasks.

Guerre
  • 45
  • 1
  • 8