0

I have a list of questions in a text file extracted from online website. I am new to nltk (in Python) and going through initial chapters from ( http://http://shop.oreilly.com/product/9780596516499.do ) . Please anybody help me out for categorizing my topics under different headings. I don't know the heading of the questions. So, how to create headings and categorize then thenafter ???

OmGanesh
  • 952
  • 1
  • 12
  • 24

1 Answers1

0

Your task consists of document clustering, where each question is a document, and cluster labeling, where label designates topic. Note that if your questions are short and/or hardly separable, e.g. belong to similar categories, then quality would be not so high.

Take a look at simple recipe for document clustering and related questions first and second.

As a baseline for labels, try max tf-idf words from cluster words or from centroids.

Community
  • 1
  • 1
Nikita Astrakhantsev
  • 4,701
  • 1
  • 15
  • 26