How to get tokens to words in BERT tokenizer

Asked Mar 21 '22 at 04:08

Active Mar 21 '22 at 06:05

Viewed 2,553 times

I have a list, using higgingface bert tokenizer I can get the mapping numerical representation.

X = ['[CLS]', '[MASK]', 'love', 'this', '[SEP]']
tokens = tokenizer.convert_tokens_to_ids(X)
toekns: [101, 103, 2293, 2023, 102]

Is there any function so that I can get tokens=[101, 103, 2293, 2023, 102] to words ['[CLS]', '[MASK]', 'love', 'this', '[SEP]']?

One possible way is to mapping, but is there any defined function to do it easily ?

edited Mar 21 '22 at 06:05

marc_s

asked Mar 21 '22 at 04:08

kowser66

3

`tokenizer.convert_ids_to_tokens(tokens, skip_special_tokens=False)`? – dennlinger Mar 24 '22 at 11:58

0 Answers0