0

I used a Hugging face tokenizer and encoder and preprocessed the data, and now I want to use Fairseq's transformer model for the translation task, but I don't have a dict.txt. What should I do, please

Can only give input and output data fit, or how to make dict.txt

rgy k
  • 1

1 Answers1

0

The dict.txt file is attached within the pre-trained model. For transformer models, see Pre-trained models

Downloading, and extracting transformer_lm.wmt19.en gives the following file structure

wmt19.en
|- bpecodes
|- dict.txt
|- model.pt

Also from the docs, the model uses Byte Pair Encoding (BPE). So it you want to train a new model, you might need to pre-process the text first

Wakeme UpNow
  • 523
  • 1
  • 4
  • 22