what happens when you merge branches in keras with different shapes?

Question

Following is the partial code. I am trying to understand what "add" does. Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?

Following is the code in Keras.

image_model = Input(shape=(2048,))
x = Dense(units=EMBEDDING_DIM, activation="relu")(image_model)
x = BatchNormalization()(x)

language_model = Input(shape=(MAX_CAPTION_SIZE,))
y = Embedding(input_dim=VOCABULARY_SIZE, output_dim=EMBEDDING_DIM)(language_model)
y = Dropout(0.5)(y)

merged = add([x, y])
merged = LSTM(256, return_sequences=False)(merged)
merged = Dense(units=VOCABULARY_SIZE)(merged)
merged = Activation("softmax")(merged)

Manoj Mohan · Accepted Answer · 2019-10-05T06:43:39.380

Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?

It's a technique called broadcasting. You can find more details here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

In the example below, the first input(16,) is broadcast along the second dimension(2,) of the second input(2,16), so that the element-wise addition can happen.

import keras
import numpy as np

input1 = keras.layers.Input(shape=(16,))
input2 = keras.layers.Input(shape=(2,16))
added = keras.layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
output = model.predict([np.ones((1,16)), np.ones((1,2,16))])
print(output.shape)
print(output)

(1, 2, 16)

[[[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]]

Thanks for the example - that really helps. – user3267989 Oct 05 '19 at 16:40 — user3267989, Oct 05 '19 at 16:40

what happens when you merge branches in keras with different shapes?

1 Answers1