0

There are a few parameters in the config, particularly when I change the max_len, hidden_size or embedding_size.

config = {
    "max_len": 64,
    "hidden_size": 64,
    "vocab_size": vocab_size,
    "embedding_size": 128,
    "n_class": 15,
    "learning_rate": 1e-3,
    "batch_size": 32,
    "train_epoch": 20
}

I get an error:

"ValueError: Cannot feed value of shape (32, 32) for Tensor 'Placeholder:0', which has shape '(?, 64)'"

The tensorflow graph below is what I have a problem understanding. Is there a way to understand what relative max_len, hidden_size or embedding_size parameters need to be set to avoid the error I get above?

        embeddings_var = tf.Variable(tf.random_uniform([self.vocab_size, self.embedding_size], -1.0, 1.0),
                                     trainable=True)
        batch_embedded = tf.nn.embedding_lookup(embeddings_var, self.x)
        # multi-head attention
        ma = multihead_attention(queries=batch_embedded, keys=batch_embedded)
        # FFN(x) = LN(x + point-wisely NN(x))
        outputs = feedforward(ma, [self.hidden_size, self.embedding_size])
        outputs = tf.reshape(outputs, [-1, self.max_len * self.embedding_size])
        logits = tf.layers.dense(outputs, units=self.n_class)

        self.loss = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=self.label))
        self.prediction = tf.argmax(tf.nn.softmax(logits), 1)

        # optimization
        loss_to_minimize = self.loss
        tvars = tf.trainable_variables()
        gradients = tf.gradients(loss_to_minimize, tvars, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
        grads, global_norm = tf.clip_by_global_norm(gradients, 1.0)

        self.global_step = tf.Variable(0, name="global_step", trainable=False)
        self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
        self.train_op = self.optimizer.apply_gradients(zip(grads, tvars), global_step=self.global_step,
                                                       name='train_step')
        print("graph built successfully!")
HumanTorch
  • 349
  • 2
  • 5
  • 16

1 Answers1

1

max_len is the length of longest sentence/document token-wise in your training set. It is the second dimension of your input tensor (the first one being batch).

Each sentence will be padded to this length. Attention models need predefined longest sentence as each token will have it's respective weight.

hidden_size is the size of of hidden RNN cell, can be set to anything which will be outputted at each time step.

embedding_size defines dimensionality of token representation (e.g. 300 is standard for word2vec, 1024 for BERT embedding etc.).

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
  • How would one input an embedded vector output with a (,1024) shape taken from BERT into this attention-based model aside from just changing the embedding_size parameter? – HumanTorch Apr 16 '19 at 11:36
  • It should be enough I suppose, if you encounter an error during this operation open a new issue as I'm on mobile and can't test code right now. – Szymon Maszke Apr 16 '19 at 11:53
  • Added a new issue here https://stackoverflow.com/questions/55709025/how-do-i-pass-bert-embeddings-into-an-attention-based-model @Szymon Maszke – HumanTorch Apr 16 '19 at 13:12