Ok, below is the template file, if I treat every feature just the same as the first feature(word), everything works fine, but when I try to only handle the shape feature of the current word, the model tries its best to tag every thing as a PER...
I can't find any detailed descriptions on crfpp template, but I think I might have been mistaken it.
For capitalization feature, is it ok if I only want to model the current word's information but ignore the previous or the next words capitalization information?
# Unigram
# word
U00:%x[-2,0]
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-1,0]/%x[0,0]
U06:%x[0,0]/%x[1,0]
U07:%x[-2,0]/%x[-1,0]/%x[0,0]/%x[1,0]/%x[0,0]
# is capitalized
U08:%x[0,1] # if current word is capitalized
# is all uppercased
U09:%x[0,2]
# is alphanumeric
U10:%x[0,3]
# lowercased prefix
U11:%x[0,4]
# lowercased suffix
U12:%x[0,5]
# add for entity like iphone 6
U15:%x[0,6] # word type
U16:%x[0,6]/%x[-1,1]
# to seperate different language types
U17:%x[0,6]/%x[1,6]
U18:%x[-1,6]/%x[0,6]
# words eclosed by bracket is likely to be an entity
U19:%x[0,7]
U20:%x[-1,7]
U21:%x[1,7]
U22:%x[0,7]/%x[1,7]
U23:%x[-1,7]/%x[0,7]