As a part of my academic research project, I am trying to build an application wherein I will have a set of urls retrieved from the web. The task is classify each of these urls into some category.
For Instance, the following URL is regarding cricket http://www.espncricinfo.com/icc_cricket_worldcup2011/content/current/story/499851.html If I give this particular URL to the classifier, it should give the output category as "Sports".
For this I am using the lingpipe classifier. I have followed the classification tutorial and ran the demo present in the demo folder. I have downloaded 20 news data set downloaded from the following link. http://people.csail.mit.edu/people/jrennie/20Newsgroups
Later, I have decreased the training sample size from 20 to 8 and have run the classification demo. It could successfully train the data and could test the data also.
But the thing is that, do I need to train the classifier every time I want to test the category of documents? If I run the classification of documents it takes 4 minutes for both training and testing the data.
Can I store the trained data once and perform the classification several times?