Fairseq without dictionary

Question

I used a Hugging face tokenizer and encoder and preprocessed the data, and now I want to use Fairseq's transformer model for the translation task, but I don't have a dict.txt. What should I do, please

Can only give input and output data fit, or how to make dict.txt

Please provide enough code so others can better understand or reproduce the problem. — Community, Feb 19 '23 at 05:10

score 0 · Answer 1 · answered Feb 19 '23 at 03:41

The dict.txt file is attached within the pre-trained model. For transformer models, see Pre-trained models

Downloading, and extracting transformer_lm.wmt19.en gives the following file structure

wmt19.en
|- bpecodes
|- dict.txt
|- model.pt

Also from the docs, the model uses Byte Pair Encoding (BPE). So it you want to train a new model, you might need to pre-process the text first

Fairseq without dictionary

1 Answers1