I was trying the SimpleTagger
tutorial provided here. I've run the exact same commands as provided on the page i.e.
java -cp "class:lib/mallet-deps.jar" cc.mallet.fst.SimpleTagger --train true --model-file nouncrf sample
and
java -cp "class:lib/mallet-deps.jar" cc.mallet.fst.SimpleTagger --model-file nouncrf stest
.
Here are my sample
and stest
files.
$ cat sample
Bill CAPITALIZED noun
slept non-noun
here LOWERCASE STOPWORD non-noun
$ cat stest
CAPITAL Al
slept
here
However, my output is different to the one on their page. This is the output I get.
Number of predicates: 9
noun
non-noun
non-noun
My questions are
- What does the "number of predicates" denote?
- Why do I get 9 predicates whereas, the official source claims 5 predicates for the same input files?
I'm using Mallet 2.0.8, if that matters.