0

I have tested the SimpleTagger for Sequence Tagging on mallet's cmd prompt interface. I would now like to train over many files and run tests in batches. Is it also possible to do this on mallet's command prompt? I want to get some hint on the performance of the algorithm for the task at hand before I dive into using the JAVA API.

I have seen that Classification tasks can be run in batch from the command prompt.

  • is it possible to use SimpleTagger in batch? if no
  • Can someone point me to a reference code where Sequence Tagging has been done in batch using the java API.

Somewhere I found a reference to "http://mallet.cs.umass.edu/index.php/Command_line_tutorial", but the link seems to be broken.

spaniard81
  • 61
  • 1
  • 8

1 Answers1

0

After some exploration, I learned that it was not possible to readily use the cc.mallet.fst.SimpleTagger for batch evaluations. Instead, I found out that the cc.mallet.examples.TrainCRF is a handy code (that uses the SimpleTagger). The code takes a train and test datasets (in Mallet sequence tagging format, instances separated by single-line) as input arguments and that's it.

I used the mallet-2.0.8 installation available on the Mallet page.

Beware to NOT tune the models based on the performance on the test set. You should avoid that and perhaps not verify the performance on test set until you have tuned the model on the training set sufficiently.

spaniard81
  • 61
  • 1
  • 8