I was reading and coding for Machine Translation Task and stumped across the two different tutorials.
One of them is Caption Generation using Visual Attention paper implementation where they have used Image features of [64,2048]
in a way such that each image is a sentence of 64 words and each word in the sentence having an embedding of 2048 length. I totally get that implementation and here is the code below for Bahdanau's Additive style Attention
:
class BahdanauAttention(tf.keras.Model):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, features, hidden):
hidden_with_time_axis = tf.expand_dims(hidden, 1)
attention_hidden_layer = (tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis)))
score = self.V(attention_hidden_layer)
attention_weights = tf.nn.softmax(score, axis=1)
context_vector = attention_weights * features
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
But when I went to Neural Machine Language Translation Task, I found this complex there which I am not able to comprehend what is happening here:
class BahdanauAttention(tf.keras.layers.Layer):
def __init__(self, units):
super().__init__()
self.W1 = tf.keras.layers.Dense(units, use_bias=False)
self.W2 = tf.keras.layers.Dense(units, use_bias=False)
self.attention = tf.keras.layers.AdditiveAttention()
def call(self, query, value, mask):
w1_query = self.W1(query)
w2_key = self.W2(value)
query_mask = tf.ones(tf.shape(query)[:-1], dtype=bool)
value_mask = mask
context_vector, attention_weights = self.attention(inputs = [w1_query, value, w2_key],mask=[query_mask, value_mask],return_attention_scores = True,)
return context_vector, attention_weights
I want to ask
- What is the difference between the two?
- Why can't we use the Code for Caption Generation in the Second one or vice versa?