Running Fairseq in memory and pre-load language models

Question

I'm running Fairseq in the command line. Fairseq loads language models on the fly and do the translation. It works fine but it takes time to load the models and do the translation. I'm thinking, if we run the Fairseq as an in-memory service and pre-load all language models, it will be quick to run the service and do the translations.

My questions are,

Will it be more efficient if we run the Fairseq as an in-memory service and pre-load the language models?
How much efficiency increase that we can expect?
How easy will it be to implement such an in-memory Fairseq service?

Thank you very much for helping out.

zijun · Answer 1 · 2021-03-31T06:32:50.903

There is an issue about preloading models:

https://github.com/pytorch/fairseq/issues/1694

For a custom model, the code below shows how to preload fairseq model in memory, which is an official example and can be found in: https://github.com/pytorch/fairseq/tree/master/examples/translation#example-usage-torchhub

from fairseq.models.transformer import TransformerModel
zh2en = TransformerModel.from_pretrained(
  '/path/to/checkpoints',
  checkpoint_file='checkpoint_best.pt',
  data_name_or_path='data-bin/wmt17_zh_en_full',
  bpe='subword_nmt',
  bpe_codes='data-bin/wmt17_zh_en_full/zh.code'
)
zh2en.translate('你好 世界')
# 'Hello World'

You can go through the source code to find more details about the method from_pretrained: https://github.com/pytorch/fairseq/blob/579a48f4be3876082ea646880061a98c94357af1/fairseq/models/fairseq_model.py#L237

Once preload, you can repeatly use without command lines.

If you want to use gpu, remember execute: model.to('cuda').

Certainly it can be more efficient if you preload. For a big model of quite big size, it takes seconds to be loaded into memory.

Running Fairseq in memory and pre-load language models

1 Answers1