How to model features correctly in crfpp

Asked Feb 11 '15 at 07:58

Active Feb 11 '15 at 08:22

Viewed 451 times

Ok, below is the template file, if I treat every feature just the same as the first feature(word), everything works fine, but when I try to only handle the shape feature of the current word, the model tries its best to tag every thing as a PER...

I can't find any detailed descriptions on crfpp template, but I think I might have been mistaken it.

For capitalization feature, is it ok if I only want to model the current word's information but ignore the previous or the next words capitalization information?

# Unigram
# word
U00:%x[-2,0]  
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-1,0]/%x[0,0]
U06:%x[0,0]/%x[1,0]
U07:%x[-2,0]/%x[-1,0]/%x[0,0]/%x[1,0]/%x[0,0]

# is capitalized
U08:%x[0,1]  # if current word is capitalized

# is all uppercased
U09:%x[0,2]

# is alphanumeric
U10:%x[0,3]

# lowercased prefix
U11:%x[0,4]

# lowercased suffix
U12:%x[0,5]


# add for entity like  iphone 6
U15:%x[0,6] # word type
U16:%x[0,6]/%x[-1,1]
# to seperate different language types
U17:%x[0,6]/%x[1,6]
U18:%x[-1,6]/%x[0,6]

# words eclosed by bracket is likely to be an entity
U19:%x[0,7]
U20:%x[-1,7]
U21:%x[1,7]
U22:%x[0,7]/%x[1,7]
U23:%x[-1,7]/%x[0,7]

edited Feb 11 '15 at 08:22

asked Feb 11 '15 at 07:58

Tilney

Hey, is your problem still unsolved ? – user2238884 Sep 07 '15 at 22:17
I think if both the current word and previous word are Capitalized, they tend to be in a chunk. This feature should make sense. Anyway, I didn't use the template above later. So what kind of problem do you encounter? – Tilney Sep 08 '15 at 02:48
No, actually I thought if you'd still need help, I'd answer. – user2238884 Sep 08 '15 at 17:54
Thanks! That would be great! – Tilney Sep 09 '15 at 06:11

How to model features correctly in crfpp

0 Answers0