1

I am trying to use paddle-paddle (https://github.com/baidu/Paddle) to train a (encoder-decoder) sequence to sequence model for POS tagging.

But instead of using a one-hot embedding of the word indices as input, I would be using an imaginary word vectors that I've created using numpy. I have added the word vectors to the settings variable in the hook() function of the dataprovider.py:

def hook(settings, src_dict, trg_dict, file_list, **kwargs):
    # job_mode = 1: training mode
    # job_mode = 0: generating mode
    settings.job_mode = trg_dict is not None
    settings.src_dict = src_dict
    settings.logger.info("src dict len : %d" % (len(settings.src_dict)))
    settings.sample_count = 0
    settings.thematrix = np.random.rand(len(src_dict), len(trg_dict))
    if settings.job_mode:
        settings.trg_dict = trg_dict
        settings.slots = [
            #integer_value_sequence(len(settings.src_dict)),
            dense_vector_sequence(len(settings.src_dict)),
            integer_value_sequence(len(settings.trg_dict)),
            integer_value_sequence(len(settings.trg_dict)),
        ]
        settings.logger.info("trg dict len : %d" % (len(settings.trg_dict)))
    else:
        settings.slots = [
            integer_value_sequence(len(settings.src_dict)),
            integer_value_sequence(len(open(file_list[0], "r").readlines()))
        ]

And when iterating through the sentences and their POS tags, I've yielded these imaginary vectors instead of the word indices at https://github.com/alvations/rowrow/blob/master/dataprovider.py#L66

Within the sequence to sequence model, since the input (aka data_layer()) isn't a one-hot embedding, I would not be using the embedding layer to wrap around the one-hot vector. But instead I'll be using the fully connected layer to squeeze the vector inputs into the encoder size, i.e. https://github.com/alvations/rowrow/blob/master/seqToseq_net.py#L49:

src_word_id = data_layer(name='source_language_word', size=source_dict_dim)
src_embedding = fc_layer(input=src_word_id, size=word_vector_dim)   
src_forward = simple_gru(input=src_embedding, size=encoder_size)
src_backward = simple_gru(input=src_embedding, size=encoder_size, reverse=True)
encoded_vector = concat_layer(input=[src_forward, src_backward])
with mixed_layer(size=decoder_size) as encoded_proj:
    encoded_proj += full_matrix_projection(input=encoded_vector)

Usually, the embedding layer would be something like:

src_embedding = embedding_layer(
    input=src_word_id,
    size=word_vector_dim,
    param_attr=ParamAttr(name='_source_language_embedding'))

The neural network computation graph seems to be correct since it didn't throw any network related error when running the train.sh.

But it throws an error when fetching the next batch:

~/Paddle/demo/rowrow$ bash train.sh 
I1104 18:59:42.636052 18632 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=train.conf --save_dir=/home/ltan/Paddle/demo/rowrow/model --use_gpu=true --num_passes=100 --show_parameter_stats_period=1000 --trainer_count=4 --log_period=10 --dot_period=5 
I1104 18:59:46.503566 18632 Util.cpp:126] Calling runInitFunctions
I1104 18:59:46.503810 18632 Util.cpp:139] Call runInitFunctions done.
[WARNING 2016-11-04 18:59:46,847 default_decorators.py:40] please use keyword arguments in paddle config.
[INFO 2016-11-04 18:59:46,856 networks.py:1125] The input order is [source_language_word, target_language_word, target_language_next_word]
[INFO 2016-11-04 18:59:46,857 networks.py:1132] The output order is [__cost_0__]
I1104 18:59:46.871026 18632 Trainer.cpp:170] trainer mode: Normal
I1104 18:59:46.871906 18632 MultiGradientMachine.cpp:108] numLogicalDevices=1 numThreads=4 numDevices=4
I1104 18:59:46.988584 18632 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-04 18:59:46,990 dataprovider.py:15] src dict len : 45661
[INFO 2016-11-04 18:59:47,316 dataprovider.py:26] trg dict len : 422
I1104 18:59:47.347944 18632 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-04 18:59:47,348 dataprovider.py:15] src dict len : 45661
[INFO 2016-11-04 18:59:47,657 dataprovider.py:26] trg dict len : 422
I1104 18:59:47.658279 18632 GradientMachine.cpp:134] Initing parameters..
I1104 18:59:49.244287 18632 GradientMachine.cpp:141] Init parameters done.
F1104 18:59:50.485621 18632 PythonUtil.h:213] Check failed: PySequence_Check(seq_) 
*** Check failure stack trace: ***
    @     0x7f71f521adaa  (unknown)
    @     0x7f71f521ace4  (unknown)
    @     0x7f71f521a6e6  (unknown)
    @     0x7f71f521d687  (unknown)
    @           0x54dac9  paddle::DenseScanner::fill()
    @           0x54f1d1  paddle::SequenceScanner::fill()
    @           0x5543cc  paddle::PyDataProvider2::getNextBatchInternal()
    @           0x5779b2  paddle::DataProvider::getNextBatch()
    @           0x6a01f7  paddle::Trainer::trainOnePass()
    @           0x6a3b57  paddle::Trainer::train()
    @           0x53a2b3  main
    @     0x7f71f4426f45  (unknown)
    @           0x545ae5  (unknown)
    @              (nil)  (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81: 18632 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

I've tried asking on Paddle's gitter.im but there's no response.

Does anyone know:

  • what does the error mean?
  • how to feed a dense vector sequence into a seqToseq model in Paddle?
  • Why is Paddle throwing this error when feeding in a dense_vector_sequence to a SeqToseq model?
alvas
  • 115,346
  • 109
  • 446
  • 738

0 Answers0