I have a list of questions in a text file extracted from online website. I am new to nltk (in Python) and going through initial chapters from ( http://http://shop.oreilly.com/product/9780596516499.do ) . Please anybody help me out for categorizing my topics under different headings. I don't know the heading of the questions. So, how to create headings and categorize then thenafter ???
Asked
Active
Viewed 863 times
1 Answers
0
Your task consists of document clustering, where each question is a document, and cluster labeling, where label designates topic. Note that if your questions are short and/or hardly separable, e.g. belong to similar categories, then quality would be not so high.
Take a look at simple recipe for document clustering and related questions first and second.
As a baseline for labels, try max tf-idf words from cluster words or from centroids.

Community
- 1
- 1

Nikita Astrakhantsev
- 4,701
- 1
- 15
- 26