Is BertTokenizer similar to word embedding?

Question

The idea of using BertTokenizer from huggingface really confuses me.

When I use

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
tokenizer.encode_plus("Hello")

Does the result is somewhat similar to when I pass a one-hot vector representing "Hello" to a learning embedding matrix?

How is

BertTokenizer.from_pretrained("bert-base-uncased")

different from

BertTokenizer.from_pretrained("bert-**large**-uncased")

and other pretrained?

score 0 · Answer 1 · answered Sep 05 '21 at 19:47

The encode_plus and encode functions tokenize your texts and prepare them in a proper input format of the BERT model. Therefore you can see them similar to the one-hot vector in your provided example.
The encode_plus returns a BatchEncoding consisting of input_ids, token_type_ids, and attention_mask.

The pre-trained model differs based on the number of encoder layers. The base model has 12 encoders, and the large model has 24 layers of encoders.

Is BertTokenizer similar to word embedding?

1 Answers1