Explicit likelihood of WordPiece used for pre-processing of BERT

Asked Aug 04 '20 at 09:58

Active Aug 04 '20 at 15:00

Viewed 140 times

At each iteration the WordPiece algorithm for subword tokenization merges the two symbols which increase the likelihood the most. Now, in the literature it is only mentioned that this likelihood is the likelihood of the language model (e.g., the same likelihood used during decoding, in case of NMT). Does anyone know which likelihood was used for pre-processing of BERT?

edited Aug 04 '20 at 15:00

asked Aug 04 '20 at 09:58

SweetSpot

Explicit likelihood of WordPiece used for pre-processing of BERT

0 Answers0