-7

Suppose we have 10000 text file and We would like to classify as political ,health,weather,sports,Science ,Education,......... I need training data set for classification of text documents and I am Naive Bayes classification Algorithm. Anyone can help to get data sets . OR Is there any another way to get classification done..I am new at Machine Learning Please explain your answer completely.

Example:

     **Sentence**                                         **Output**

1) Obama won election. ----------------------------------------------->political

2) India won by 10 wickets ---------------------------------------------->sports

3) Tobacco is more dangerous --------------------------------------------->Health

4) Newtons laws of motion can be applied to car -------------->science

Any way to classify these sentences into their respective categories

  • 1
    Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [on topic](http://stackoverflow.com/help/on-topic) applies here. – Prune Oct 26 '15 at 15:44

1 Answers1

1

Have you tried to google it? There are tons and tons of datasets for text categorization. The classical one is Reuters-21578 (https://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+Collection), another famous one and mentioned almost in each ML book is 20 newsgroup: http://web.ist.utl.pt/acardoso/datasets/

But there are lots of other, one google query away from you. Just load them, slightly adjust if needed and train your classifier on that datasets.

Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70
  • I have downloaded both Reuters and 20 newsgroup . but my problem is I am not getting how to use them in my system. My naive bayes classifier take input as – nikhil channa Oct 26 '15 at 21:16
  • I have downloaded both Reuters and 20 newsgroup . but my problem is I am not getting how to use them in my system. My naive bayes classifier take input as trainingFiles.put(Classifier_NAME, NaiveBayesExample.class.getResource( Filename_HERE)); – nikhil channa Oct 26 '15 at 21:24
  • Okay -- what happened when you used one of the files you found to train a model? You already have the file name; choose the classifier you want, specify that, and make the call. – Prune Oct 26 '15 at 23:43
  • hello prune, basically is working but as weak classifier – nikhil channa Oct 29 '15 at 09:42
  • Crime news classified as Entertainment . So I want data set which covers all kinds of news categories which varies from political to health – nikhil channa Oct 29 '15 at 09:43