I was trying to use fairseq to train a model for English-Russian,English-French,English-Spanish,English-German data but have been getting a CUDA Error which prevents me from running the model. I have tried using multiple batch sizes,learning rate but am unable to run .
fairseq-train pre \
--arch transformer_wmt_en_de \
--task translation_multi_simple_epoch \
--encoder-langtok src --decoder-langtok --lang-pairs en-ru,en-fr,en-es,en-de \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--optimizer adam --adam-betas '(0.9, 0.98)' \
--lr-scheduler inverse_sqrt --lr 1e-03 --warmup-updates 4000 --max-update 100000 \
--dropout 0.3 --weight-decay 0.0001 \
--max-tokens 4096 --max-epoch 20 --update-freq 8 \
--save-interval 10 --save-interval-updates 5000 --keep-interval-updates 20 \
--log-format simple --log-interval 100 \
--save-dir checkpoints --validate-interval-updates 5000 \
--fp16 --num-workers 0 --batch-size 64
The above code is what I have used with various different parameters for batch size, learning rate, etc., but all seem to amount to a CUDA Error.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.57 GiB (GPU 0; 15.74 GiB total capacity; 5.29 GiB already allocated; 9.50 GiB free; 5.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Any kind of help would be appreciated.