4

As a part of my academic research project, I am trying to build an application wherein I will have a set of urls retrieved from the web. The task is classify each of these urls into some category.

For Instance, the following URL is regarding cricket http://www.espncricinfo.com/icc_cricket_worldcup2011/content/current/story/499851.html If I give this particular URL to the classifier, it should give the output category as "Sports".

For this I am using the lingpipe classifier. I have followed the classification tutorial and ran the demo present in the demo folder. I have downloaded 20 news data set downloaded from the following link. http://people.csail.mit.edu/people/jrennie/20Newsgroups

Later, I have decreased the training sample size from 20 to 8 and have run the classification demo. It could successfully train the data and could test the data also.

But the thing is that, do I need to train the classifier every time I want to test the category of documents? If I run the classification of documents it takes 4 minutes for both training and testing the data.

Can I store the trained data once and perform the classification several times?

Chris Pfohl
  • 18,220
  • 9
  • 68
  • 111
funnyguy
  • 513
  • 3
  • 6
  • 15
  • By the way, S.O. asks that you refrain from signatures. (It's also considered bad form to say, "Please try to find time to help me"). [See FAQ](http://stackoverflow.com/faq) – Chris Pfohl Jan 07 '13 at 18:21

1 Answers1

4

You need to serialize the the trained models to disk and then you can deserialize them and have the classifier ready to go.

Once you have a classifier trained up use

 AbstractExternalizable.compileTo(classifier,modelFile);

To write the model to disk.

To read in you will need

AbstractExternalizable.readObject(modelFile);

Look at the Java doc for AbstractExternalizable.

The model will not be able to accept additional training events because it has been compiled.

ham-sandwich
  • 3,975
  • 10
  • 34
  • 46
Breck
  • 56
  • 1