Mallet SimpleTagger different number of predicates

Question

I was trying the SimpleTagger tutorial provided here. I've run the exact same commands as provided on the page i.e.

java -cp "class:lib/mallet-deps.jar" cc.mallet.fst.SimpleTagger --train true --model-file nouncrf sample

and

java -cp "class:lib/mallet-deps.jar" cc.mallet.fst.SimpleTagger --model-file nouncrf stest.

Here are my sample and stest files.

$ cat sample

Bill CAPITALIZED noun  
slept non-noun  
here LOWERCASE STOPWORD non-noun

$ cat stest

CAPITAL Al  
        slept  
        here

However, my output is different to the one on their page. This is the output I get.

Number of predicates: 9  
noun   
non-noun   
non-noun

My questions are

What does the "number of predicates" denote?

Why do I get 9 predicates whereas, the official source claims 5 predicates for the same input files?

I'm using Mallet 2.0.8, if that matters.

I get 9 as well if that helps – user1893354 Jun 30 '17 at 00:51 — user1893354, Jun 30 '17 at 00:51

score 0 · Answer 1 · edited Jan 08 '18 at 07:02

When you start training, the first message that SimpleTagger gives you is:

Number of features in training data: x
Number of predicates: y

The number of predicates, y, is the number of distinct tokens (or lines) that your training data contains.

When you label a file using the model from the previous train (that had y predicates), you get a message:

Number of predicates: z

This z, is the sum of y and the number of distinct tokens (or lines) that the file you want to label contains. That is why z is always greater (or equal) than y. If for example you try to label an empty of content text file with a model that had y predicates, you will get a number of predicates y, which is y + 0 = y, cause your empty file had 0 labels.

Mallet SimpleTagger different number of predicates

1 Answers1