1

I am running the following command

onmt_translate  -model demo-model_step_100000.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose

The results in the file 'pred.txt' is something completely different than the source sentences given for translation

The corpus size was 3000 parallel sentences. The preprocess command was -

onmt_preprocess -train_src EMMT/01engParallel_onmt.txt -train_tgt EMMT/01maiParallel_onmt.txt -valid_src EMMT/01engValidation_onmt.txt -valid_tgt EMMT/01maiValidation_onmt.txt -save_data EMMT/demo

training was on the demo model

onmt_train -data EMMT/demo -save_model demo-model
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
girishnjha
  • 81
  • 1
  • 3

1 Answers1

0

You cannot get decent translations even on "seen" data because:

  • Your model got trained on too few sentence pairs (3000 is really too, too few to train a good model). You can only get some more or less meanignful translations with corpora of 4M+ (and the more the better).
  • onmt_train -data EMMT/demo -save_model demo-model trains a small (2 layers x 500 neurons) unidirectional RNN model (see documentation). The transformer model type is recommended to obtain state-of-the-art results.

The FAQ says this about how to run a transformer model training:

The transformer model is very sensitive to hyperparameters. To run it effectively you need to set a bunch of different options that mimic the Google setup. We have confirmed the following command can replicate their WMT results.

python  train.py -data /tmp/de2/data -save_model /tmp/extra \
        -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8  \
        -encoder_type transformer -decoder_type transformer -position_encoding \
        -train_steps 200000  -max_generator_batches 2 -dropout 0.1 \
        -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 \
        -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 \
        -max_grad_norm 0 -param_init 0  -param_init_glorot \
        -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 \
        -world_size 4 -gpu_ranks 0 1 2 3

Here are what each of the parameters <mean:

param_init_glorot -param_init 0: correct initialization of parameters

position_encoding: add sinusoidal position encoding to each embedding

optim adam, decay_method noam, warmup_steps 8000: use special learning rate.

batch_type tokens, normalization tokens, accum_count 4: batch and normalize based on number of tokens and not sentences. Compute gradients based on four batches.

label_smoothing 0.1: use label smoothing loss.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks. I am now training with 45000 parallel data. The pre-processing goes fine, but the train command fails with "Assertion Error". Here is my train command - onmt_train -encoder_type transformer -decoder_type transformer -data EMMT/demo -save_model demo-model – girishnjha Sep 01 '20 at 18:57
  • @girishnjha 45000 is still too few data. Without the full error stack trace, it is impossible to help you more. – Wiktor Stribiżew Sep 01 '20 at 18:58
  • The AssertionError goes away if i remove the encoder/decoder as transformer – girishnjha Sep 02 '20 at 01:25
  • [TransformerEncoderLayer( File "C:\Users\Girish\AppData\Local\Programs\Python\Python38\lib\site-packages \opennmt_py-1.2.0-py3.8.egg\onmt\encoders\transformer.py", line 30, in __init__ self.self_attn = MultiHeadedAttention( File "C:\Users\Girish\AppData\Local\Programs\Python\Python38\lib\site-packages \opennmt_py-1.2.0-py3.8.egg\onmt\modules\multi_headed_attn.py", line 53, in __in it__ assert model_dim % head_count == 0 AssertionError – girishnjha Sep 03 '20 at 06:14
  • this is the result of adding "transformer" as encoder/decoder. Will appreciate any help – girishnjha Sep 03 '20 at 06:15
  • @girishnjha Just see how to run a transformer training [here](https://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model). Again, 45K sentence pair corpus is **too small**. You will only be able to get good results with 4 **million** and more. – Wiktor Stribiżew Sep 03 '20 at 07:38
  • Hi I am training on a workstation with windows 10 and a AMD GPU. Getting the following error File "C:\Users\ritu\anaconda3\lib\site-packages\opennmt_py-1.2.0-py3.8.egg\onmt\bin\train.py", line 161, in __init__ signal.signal(signal.SIGUSR1, self.signal_handler) AttributeError: module 'signal' has no attribute 'SIGUSR1' – girishnjha Sep 13 '20 at 17:51
  • My train command now is - onmt_train -data EMMT/demo -save_model demo-model -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 200000 -max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 -world_size 4 -gpu_ranks 0 1 2 3 – girishnjha Sep 13 '20 at 17:54
  • @girishnjha I am not sure if 1) ONMT-py can work in Windows, 2) you need NVIDIA GPU, from what I know, AMD GPUs usually do not work. – Wiktor Stribiżew Sep 13 '20 at 18:25