I would like to append words to the vocabulary created by tft.vocabulary
that are not a part of the training samples (i.e. <mask>
and <pad>
tokens).
I see in the docs that the tft.vocabulary
function can take an argument key_fn
which the docs says:
Supply key_fn if you would like to generate a vocabulary with coverage over specific keys.
but with the key_fn below it still does not append the <mask>
and <pad>
tokens to the vocabulary.
def _key_fn(x):
return tf.constant(['<mask>', '<pad>'])
vocab = tft.vocabulary(
words,
key_fn = lambda x : _key_fn(x),
top_k = config.VOCAB_SIZE
)