How to feed and build a "Input->Dense->Conv2D->Dense" network in keras?

Question

This is a simple example that reproduces my issue in a network I am trying to deploy.

I have an image input layer (which I need to maintain), then a Dense layer, Conv2D layer and a dense layer.

The idea is that the inputs are 10x10 images and the labels are 10x10 images. Inspired by my code and this example.

import numpy as np
from keras.models import Model
from keras.layers import Input, Conv2D

#Building model
size=10
a = Input(shape=(size,size,1))
hidden = Dense(size)(a)
hidden = Conv2D(kernel_size = (3,3), filters = size*size, activation='relu', padding='same')(hidden)
outputs = Dense(size, activation='sigmoid')(hidden)

model = Model(inputs=a, outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

#Create random data and accounting for 1 channel of data
n_images=55
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))

#Fit model
model.fit(data, labels, verbose=1, batch_size=10, epochs=20)

print(model.summary())

I get the following error: ValueError: Error when checking target: expected dense_92 to have shape (10, 10, 10) but got array with shape (10, 10, 1)

I don't get an error if I change:

outputs = Dense(size, activation='sigmoid')(hidden)

with:

outputs = Dense(1, activation='sigmoid')(hidden)

No idea how Dense(1) is even valid and how it allows 10x10 output signal as model.summary() indicates:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_26 (InputLayer)        (None, 10, 10, 1)         0         
_________________________________________________________________
dense_93 (Dense)             (None, 10, 10, 10)        20        
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 10, 10, 100)       9100      
_________________________________________________________________
dense_94 (Dense)             (None, 10, 10, 1)         101       
=================================================================
Total params: 9,221
Trainable params: 9,221
Non-trainable params: 0
_________________________________________________________________
None

If the the input images are 10x10 then why the `size=100`? I think that's what has confused you. Plus, as you may or may not know [the dense layer is applied on the last axis](https://stackoverflow.com/a/52092176/2099607); so as you can see in the model summary, the `Dense(1)` is applied on the last axis of output of convolution layer before it. — today, Sep 17 '18 at 08:37
@today, `size=100` was a typo. So how can I apply a Dense on the actual 10x10 matrix result of Conv2D? — 0x90, Sep 17 '18 at 11:26
You are already applying a dense layer, though on activation of all filters for each pixel i.e. `(1,1,100)`. If you would like to apply the dense layer on the whole output of convolution layer, put a Flatten layer after it and then use the Dense layer. However, it is better to use at least one more combination of maxpooling2d and conv2d layers to decrease the capacity (i.e. size) of network and then use flatten and dense layer at the end. — today, Sep 17 '18 at 11:31
@today why does the output of the 2D convolution is (10,10,100): `conv2d_9 (Conv2D) (None, 10, 10, 100)`? It should be a matrix of the size 10,10. I guess it because I create size*size filters. But how can I preserve the contribution from each pixel properly. — 0x90, Sep 17 '18 at 11:35
Yes, that's because you have 100 filters: [:, :, 1] is the response of filter one, [:, :, 2] is the response of filter two, and so on (and each one of them is a matrix of 10x10). I don't understand what you mean by "preserve the contribution from each pixel properly". Do you know how convolution layer works? Or tell me specifically what you are trying to achieve (you want to classify the images, you want to do segmentation, etc.) and maybe I could help better. — today, Sep 17 '18 at 11:48
@today what I am trying to do isn't standard. I have set of images and for each image I want to find a binary image of the same size that if the value of its pixel is 1 it means the feature exists in the input image. — 0x90, Sep 17 '18 at 11:57
@today the insight wether a pixel has a feature should be taken both from local information (extracted by a convolution layers) and global information extracted by Dense layers. — 0x90, Sep 17 '18 at 12:04

score 4 · Accepted Answer · edited Oct 03 '18 at 23:59

Well, according to your comments:

what I am trying to do isn't standard. I have set of images and for each image I want to find a binary image of the same size that if the value of its pixel is 1 it means the feature exists in the input image

the insight wether a pixel has a feature should be taken both from local information (extracted by a convolution layers) and global information extracted by Dense layers.

I guess you are looking for creating a two branch model where one branch consists of convolution layers and another one is simply one or more dense layers on top of each other (although, I should mention that in my opinion one convolution network may achieve what you are looking for, because the combination of pooling and convolution layers and then maybe some up-sampling layers at the end somehow preserves both local and global information). To define such a model, you can use Keras functional API like this:

from keras import models
from keras import layers

input_image = layers.Input(shape=(10, 10, 1))

# branch one: dense layers
b1 = layers.Flatten()(input_image)
b1 = layers.Dense(64, activation='relu')(b1)
b1_out = layers.Dense(32, activation='relu')(b1)

# branch two: conv + pooling layers
b2 = layers.Conv2D(32, (3,3), activation='relu')(input_image)
b2 = layers.MaxPooling2D((2,2))(b2)
b2 = layers.Conv2D(64, (3,3), activation='relu')(b2)
b2_out = layers.MaxPooling2D((2,2))(b2)

# merge two branches
flattened_b2 = layers.Flatten()(b2_out)
merged = layers.concatenate([b1_out, flattened_b2])

# add a final dense layer
output = layers.Dense(10*10, activation='sigmoid')(merged)
output = layers.Reshape((10,10))(output)

# create the model
model = models.Model(input_image, output)

model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.summary()

Model summary:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 10, 10, 1)    0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 8, 8, 32)     320         input_1[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 4, 4, 32)     0           conv2d_1[0][0]                   
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 100)          0           input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 2, 2, 64)     18496       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 64)           6464        flatten_1[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 1, 1, 64)     0           conv2d_2[0][0]                   
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 32)           2080        dense_1[0][0]                    
__________________________________________________________________________________________________
flatten_2 (Flatten)             (None, 64)           0           max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 96)           0           dense_2[0][0]                    
                                                                 flatten_2[0][0]                  
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 100)          9700        concatenate_1[0][0]              
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 10, 10)       0           dense_3[0][0]                    
==================================================================================================
Total params: 37,060
Trainable params: 37,060
Non-trainable params: 0
__________________________________________________________________________________________________

Note that this is one way of achieving what you are looking for and it may or may not work for the specific problem and the data you are working on. You may modify this model (e.g. remove the pooling layers or add more dense layers) or completely use another architecture with different kind of layers (e.g. up-sampling, conv2dtrans) to reach a better accuracy. At the end, you must experiment to find the perfect solution.

Edit:

For completeness here is how to generate data and fitting the network:

n_images=10
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
model.fit(data, labels, verbose=1, batch_size=32, epochs=20)

Thank you, let's say I have a multi channel image. How would this setup need to change? — 0x90, Sep 17 '18 at 12:50
@0x90 Nothing except the input shape `shape=(10, 10, num_channels)` and output shape `Reshape((10,10, num_channels))`. — today, Sep 17 '18 at 13:05
Cool! the output should remain single channel, so no need to reshape. — 0x90, Sep 17 '18 at 13:06
@0x90 So there is no need to even change the output shape :) — today, Sep 17 '18 at 13:07
Last question when I flatten the network and then call to Conv2D how does keras know how to reshape it into a matrix before applying convolution? — 0x90, Sep 17 '18 at 13:08
@0x90 You flatten the image to feed it to Dense layers. The input of Conv layers is the image itself. Look at the code. BTW, that reshape at the end is necessary if you specifically want the output shape of the network to be `(10,10)` and not `(100,)`. — today, Sep 17 '18 at 13:11
Interesting, keras adds a flatten layer between the pooling and the Conv2D: see in the summary: `max_pooling2d_1`, `flatten_1`, and `conv2d_2` — 0x90, Sep 17 '18 at 13:14
@0x90 No, That's the flatten layer before the dense layers. The order of the layers in `model.summary()` does not indicate the connections like in Sequential models. Instead, look at the "connected to" column to find out layer connections. This is a two branch model created using functional API. — today, Sep 17 '18 at 13:16
I am sorry for asking another question. Though could you please include an example how to generate `data` and `labels` and call to `model.fit(data, labels, verbose=1, batch_size=10, epochs=20)`? — 0x90, Oct 03 '18 at 18:52
@0x90 No problem. But I don't understand what is ambiguous for you? If you would like random data then the approach you have used for generating it in your question is correct and there is nothing special or wrong about the `fit` call. If you would like real data, well first you need to get some from internet or datasets like imagenet or mnist and then preprocess them to make them ready for training. If you care to elaborate more I can help you better. Or alternatively, if you have encountered any errors/issues you can post a new question with enough details so others and I could help you. — today, Oct 03 '18 at 19:14
@0x90 Ah! Now I see what you meant... I was thinking to myself what is special or ambiguous about fitting the model in this case and therefore I could not decide what to write... no problem, thanks for adding it. — today, Oct 04 '18 at 00:07

How to feed and build a "Input->Dense->Conv2D->Dense" network in keras?

1 Answers1