can I use numerical features in crf model

Question

Is it possible/good to add numerical features in crf models? e.g. position in the sequence.

I'm using CRFsuite. It seems all the features will be converted to string, e.g. 'pos=0', 'pos=1', which then lose it's meaning as euclidean distance.

Or should I use them to train another model, e.g. svm, then ensemble with crf models?

It can be done with format like `LABEL f1:0.1 f2:0.4 f3:0.8 f4:0.2 f5:0.9`. see https://datascience.stackexchange.com/a/4886/94403 — Hai Feng Kao, May 08 '20 at 07:11

score 9 · Accepted Answer · edited Apr 28 '16 at 21:24

I figured out that CRFsuite does handle numerical features, at least according to this documentation:

{“string_key”: float_weight, ...} dict where keys are observed features and values are their weights;

{“string_key”: bool, ...} dict; True is converted to 1.0 weight, False - to 0.0;

{“string_key”: “string_value”, ...} dict; that’s the same as {“string_key=string_value”: 1.0, ...}

[“string_key1”, “string_key2”, ...] list; that’s the same as {“string_key1”: 1.0, “string_key2”: 1.0, ...}

{“string_prefix”: {...}} dicts: nested dict is processed and “string_prefix” s prepended to each key.

{“string_prefix”: [...]} dicts: nested list is processed and “string_prefix” s prepended to each key.

{“string_prefix”: set([...])} dicts: nested list is processed and “string_prefix” s prepended to each key.

As long as:

I keep the input properly formatted;
I use float vs string of float;
I normalize it.

What do you mean by normalize the float feature? how did you do it in context of CRFsuite? — GeorgeOfTheRF, May 30 '22 at 15:04

score 4 · Answer 2 · answered Oct 02 '14 at 15:06

4

CRF itself can use numerical features, and you should use them, but if your implementations converts them to strings (encodes in the binary form by the "one hot spot encoding") then it might be of reduced significance. I suggest to look for more "pure" CRF which allows continuous variables.

A fun fact is that CRF in its core is just structured MaxEnt (LogisticRegression) which works in continuous domain, this string encoding is actually a way to go from categorical values into continuous domain so your problem is actually a result of "overdesigning" of CRFSuite which forgot about actual capabilities of CRF model.

answered Oct 02 '14 at 15:06

lejlot

64,777
8
131
164

Got you. The reason I go with CRFsuite is that it comes with a nice [python wrapper](http://python-crfsuite.readthedocs.org/en/latest/) which is easy to use. Will it help to use those numerical features in another model and then ensemble with crf? – Lishu Oct 02 '14 at 16:25
It does not seem right, CRF is a sequence classifier. Ensembling it with non-sequential model is rather werid. It would be much more profitable to look for a way to actually include the numerical features inside CRF as -as said before- CRF is fully capable of such actions – lejlot Oct 02 '14 at 23:29

score 0 · Answer 3 · answered Sep 03 '19 at 22:11

Just to clarify a bit the answer by Lishu (which is correct but might confuse other readers as it did to me until I tried it). This:

{“string_key”: float_weight, ...} dict where keys are observed features and values are their weights

could have been written as

{“feature_template_name”: feature_value, ...} dict where keys are feature names and values are their values

i.e. with this you're not setting the weight for the CRF corresponding to this feature_template, but the value of this feature. I prefer to refer to them feature templates that have feature values, so that everything is more clear than just "features". Then, the CRF will learn a weight associated to each of the possible feature_values for this feature_template

can I use numerical features in crf model

3 Answers3