I have a two layer bidirectional recurrent stack in the final layer of an existing model that is trained to extract text from images of constant size. I am adding an additional convolution layer to the existing architecture before the recurrent stack to enable text detection over larger images. The model looks like this:
Layer (type) Output Shape Param #
=================================================================
the_input (InputLayer) (None, 64, 128, 3) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 64, 128, 48) 3648
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 64, 48) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 64, 64) 76864
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 64, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 16, 64, 128) 204928
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 32, 128) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 1, 1, 2048) 67110912
_________________________________________________________________
lambda_1 (Lambda) (None, 1, 1, 1) 0
=================================================================
conv2d_4 is the new convolution layer to enable images of sizes other than 128 x 64. lambda_1 is a presence indicator variable with one value per sliding window position. The code to create the following models is as follows:
input_data = Input(name='the_input', shape=input_shape)
def add_conv(prev_component, kernel_dims, num_filters, max_pool_dims=None, padding='same'):
cnv = Conv2D(num_filters, kernel_dims, activation='relu',
padding=padding)(prev_component)
if max_pool_dims is not None:
cnv = MaxPooling2D(pool_size=max_pool_dims)(cnv)
return cnv
prev = input_data
prev = add_conv(prev, (5, 5), 48, max_pool_dims=(2, 2))
prev = add_conv(prev, (5, 5), 64, max_pool_dims=(2, 1))
prev = add_conv(prev, (5, 5), 128, max_pool_dims=(2, 2))
#[kernel_height, kernel_width, prev_filter_count, new_filter_count]
prev = add_conv(prev, (8, 32), time_dense_size, padding='valid')
def is_plate_func(windowed_data):
is_plate_w = K.variable(K.truncated_normal(stddev=0.1, shape=(1, 1, 2048, 1)))
is_plate_b = K.variable(K.constant(.1, shape=[1]))
is_plate_out = K.bias_add(K.conv2d(windowed_data, is_plate_w), is_plate_b)
return is_plate_out
is_plate_lambda = Lambda(is_plate_func)(prev)
Model(inputs=input_data, outputs=[is_plate_lambda]).summary()
The code for the recurrent stack that I would like to attach to the model in the same fashion as lambda_1 was attached is as follows:
bdrnn_model = Sequential()
bdrnn_model.add(Reshape((2048, 1), input_shape=(1, 1, 2048)))
for idx in range(num_backward_layers):
if idx == num_backward_layers - 1:
merge_mode = 'concat'
else:
merge_mode = 'sum'
bdrnn_model.add(Bidirectional(GRU(rnn_size, return_sequences=True),
merge_mode=merge_mode))
bdrnn_model.add(Dense(num_output_characters + 2, activation='relu'))
Is there a way to get Keras to use 'bdrnn_model' as a convolution filter?