I'm running Fairseq in the command line. Fairseq loads language models on the fly and do the translation. It works fine but it takes time to load the models and do the translation. I'm thinking, if we run the Fairseq as an in-memory service and pre-load all language models, it will be quick to run the service and do the translations.
My questions are,
- Will it be more efficient if we run the Fairseq as an in-memory service and pre-load the language models?
- How much efficiency increase that we can expect?
- How easy will it be to implement such an in-memory Fairseq service?
Thank you very much for helping out.