1

I'm working a multi-layer RNN for a word-level language model. It has a one-to-one configuration so that for each step in the training set a prediction is validated against an eval set of the same dimensions. The model borrows almost entirely from the example laid out here but adapted for word-level modeling. My variation is here.

Another difference is between the example script linked above is the data I'm working with has multiple buckets. I believe that I've bucketed the data correctly but when it's fed to mx.model.buckets, I get some variation of this error:

Start training with 1 devices
Error in exec$update.arg.arrays(arg.arrays, match.name, skip.null) : 
  [17:25:59] `c:\jenkins\workspace\mxnet\mxnet\src\operator\tensor\../elemwise_op_common.h:123: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node  at 0-th output: expected [32,7], got [32,8]`

The issue appears to be a mismatch between the actual and expected input dimensions. The batch size is 32, and the values of 7 and 8 appear related to consecutive buckets in bucket.plan.

train_data <- mx.io.bucket.iter(buckets = train_buckets$buckets,
                                batch.size = batch_size, 
                                data.mask.element = 0, shuffle = TRUE)

eval_data <- mx.io.bucket.iter(buckets = eval_buckets$buckets,
                               batch.size = batch_size,
                               data.mask.element = 0, shuffle = TRUE)

From the iterators above, I can pull the batch and bucket data:

> head(train_data$bucket.plan, 10)
 8  9 10  3 22 10 10  8 15 21 
 1  1  1  1  1  2  3  2  1  1 
> train_data$batch
[1] 1
> train_data$bucketID
8 
1 

The value at train_data$batch is used in bucket.iter as an index for pulling the next bucket name from train_data$bucket.plan:

iter.next = function() {
        .self$batch <- .self$batch + 1
        .self$bucketID <- .self$bucket.plan[batch]

The bucket names correspond the length of the sentences assigned to them minus 1. So for bucket 8, the batch dimension should be [32, 7], what the training function expects. But as stated in the error it's actually picking up the input dimensions for bucket.plan[2].

On other occasions, the first batch trains successfully only to have an error thrown on the second batch in which the input dimensions for the bucket at bucket.plan[n] are expected, but the dimensions for bucket.plan[n-1] are returned instead:

Start training with 1 devices
Error in exec$update.arg.arrays(arg.arrays, match.name, skip.null) : 
  [17:37:53] c:\jenkins\workspace\mxnet\mxnet\src\operator\tensor\../elemwise_op_common.h:123: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node  at 0-th output: expected [32,42], got [32,10]
> head(train_data$bucket.plan, 10)
11 43  8  6  7  3  6 12 15 12 
 1  1  1  1  1  1  2  1  1  2 
> train_data$batch
[1] 2
> train_data$bucketID
43 
 1 

There doesn't appear to be a mismatch between the data and labels:

> dim(train_data$buckets[[names(train_data$bucketID)]]$data)
[1]  42 186
> dim(train_data$buckets[[names(train_data$bucketID)]]$label)
[1]  42 186

The scripts that I believe are at issue here are incubator-mxnet/R-package/R/mx.io.bucket.iter.R and incubator-mxnet/R-package/R/executor.R. For some reason, the indices for iterating through bucket.plan appear not to be updating correctly.

I've found some other threads on this topic, but nothing that has helped me resolve the error.

Any ideas?

Conner M.
  • 1,954
  • 3
  • 19
  • 29

0 Answers0