2

I am currently working on a Named Entity Recognition task. I am using a Conditional Random Field algorithm to classify my marked entites. I was wondering if this algorithm is bi-directional like BERT ?

The features the algorithm has for each word include the previous and the next word, so I guess it is the case. Does that also mean that the CRF is predicting on the whole sentence ? Or on each word ?

Thank you for any lead on this question !

1 Answers1

4

No.

For example, a linear-chain conditional random field looks like this:

CRF

As you can see, to predict Y4, you use the observation features phi_4'(Y4,X4) and the transition feature phi_3(Y3,Y4). This is because of the Markov assumption a CRF follows, i.e., prediction of Y3 is already dependent upon, Y3 and Y2, therefore, Y'4 transition probability is estimated from only Y3.

However, you can always provide the input of your observation feature sequence in reverse order to to get the reverse transition probabilities.

0x5050
  • 1,221
  • 1
  • 17
  • 32
  • Thank you very much. What I don't understand is why we provide in our features the information of the previous and of the next word ? Is it because the algorithm also predicts by reverse order as you said ? – SpartanGandalf Sep 18 '19 at 11:10
  • Usually, CRFs are used for sequence labeling, i.e, to model joint prediction of a sequence using both individual features and transition features. This becomes very useful for certain NLP tasks like NER, where you want to label a certain sequence of words with the same label, rather than a single one, e.g BIO tagging and this is why inputs are given in left-to right order. However, the feature functions in CRFare arbitray, in side the feature function you can process your input any way you want (including in reverse) and return a aggregated output. – 0x5050 Sep 19 '19 at 05:17