If the vocabulary is ordered from the more frequent word to the less frequent, placing '[UNK]' at the beginning means that it occurs most. But what if '[UNK]' isn't the most frequent word? Should I put it at another place in the vocabulary, according to its frequency?
I found such issue when doing this tutorial -> https://www.tensorflow.org/tutorials/text/word2vec
When I'm doing negative sampling using the function tf.random.log_uniform_candidate_sampler, the negative samples with low token (s.g. 0,1,2 ...) will be sampled most. If '[UNK]' is the first (or second when using padding) in the vocabulary, which means that it has token 0 (or 1 when using padding), then the '[UNK]' will be heavily sampled as negative sample. If '[UNK]' happens a lot, there is no problem, but what if it doesn't? Then it should receive a higher token, shouldn't?