Replace MLP with CNN

Question

I have built up a NN with following architecture:

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
(1901, 456, 3) (476, 456, 3) (1901, 3, 3) (476, 3, 3)

model = Sequential()

model.add(Flatten(input_shape=(456,3)))

model.add(Dense(64, activation='relu'))

model.add(Dense(32, activation='relu'))

model.add(Dense(3 * 3))

model.add(Reshape((3, 3)))

model.compile('adam', 'mse')

history = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=100)

Now I want to replace this architecture with a analogue CNN which does the same; but when trying to implement this I always get problems with the dimensions of the different layers. And my error is always like this

ValueError: Error when checking input: expected conv2d_3_input to have 4 dimensions, but got array with shape (x, x, x)

the dataset remains the same, just the NN architecture changes and this is my first approach:

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
             activation='relu',
             input_shape=(1901,456,3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))

Can someone help me out to replace my first NN into a CNN?

Might be a duplicate question? https://stackoverflow.com/questions/48794214/expected-input-to-have-4-dimensions-but-got-array-with-shape — maxi.marufo, Jan 27 '20 at 12:54
If he tries the solution it won't work for him, and he may not be helped by it — Orphee Faucoz, Jan 27 '20 at 13:02

Orphee Faucoz · Accepted Answer · 2020-01-27T14:29:42.267

2

Your network is well defined, the error you're getting is during the fit operation. And why is that the case.

Well Conv2D is looking for data with 4D shape as you can see here : doc

X_train shape must then be (samples, channels, rows, cols)

When you gave input_shape=(1901,456,3), you didn't have to specify the number of samples.

But during the fit operation you need to have a data shaped as (samples, channels, rows, cols) .

And now you see that you have a problem. Why is your X_train shaped like that, it seems that you only have one image. You can feed it by reshaping it using :

X_train = X_train.reshape((1, 1901, 456, 3))

But that seems odd, you're only feeding one image to your network.

Edit : after clarification on the comments, conv1D will be better in this type of case, here is how to do it:

model = Sequential()
model.add(Conv1D(32, kernel_size=3,
             activation='relu',
             input_shape=(456,3)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3 * 3, activation='softmax'))
model.add(Reshape((3, 3))

edited Jan 27 '20 at 14:29

answered Jan 27 '20 at 12:44

Orphee Faucoz

1,220
5
9

thank you for your response! now this a bit clearer to me.. my X_train is shaped like that because I want to feed in 1901 mini samples of size (456,3). actually the (456,3) represents point particles with x,y,z coordinates. maybe this helps you to understand my dataset – jeffs Jan 27 '20 at 13:01
Alright, in that case `conv2D` are not the best layer. Your datas are more like signals than image. I would try to use `conv1D` instead I'm editing my answer with the modification on your model – Orphee Faucoz Jan 27 '20 at 13:05
Also, you're using a softmax in your last layer, that will clip the output between [0,1] and is not well suited for regression (look at the sigmoid curve), if it's what you want there is no problem, but since it's regression (mse) you shoud be using linear activation, such as `linear` or `relu` – Orphee Faucoz Jan 27 '20 at 13:09
thank you, I implemented the things you mentioned but I get this error: TypeError: The added layer must be an instance of class Layer. Found: – jeffs Jan 27 '20 at 13:32
This is a problem between tf.keras and keras I would say. You should be using either only keras or tensorflow.keras. Here is a link to a similar error : [link](https://stackoverflow.com/questions/55407970/how-to-fix-typeerror-the-added-layer-must-be-an-instance-of-class-layer-in-p) – Orphee Faucoz Jan 27 '20 at 13:36
thank you so far, everything worked well! but when fitting the model I get the following error: see my post answer – jeffs Jan 27 '20 at 13:55

score -1 · Answer 2 · answered Jan 27 '20 at 13:55

-1

now everything worked with the architecture and there is also no problem when compiling the NN;

batch_size = 128
epochs = 12


model.compile(
 optimizer='rmsprop',
 loss=tf.keras.losses.MeanSquaredError(),
 metrics=['mse'],
 )

 model.fit(X_test, Y_train,
      batch_size=batch_size,
      epochs=epochs,
      verbose=1,
      validation_data=(X_test, Y_test))
 score = model.evaluate(X_test, Y_test, verbose=0)

but when trying to fit I get following error :

ValueError: Input arrays should have the same number of samples as target 
arrays. Found 476 input samples and 1901 target samples.

what am I missing here?

answered Jan 27 '20 at 13:55

jeffs

321
2
9

You used X_test in `fit`instead of X_train :) – Orphee Faucoz Jan 27 '20 at 13:57
I'm sorry to keep bothering you with this but during the first epoch it says: InvalidArgumentError: Incompatible shapes: [128,3] vs. [128,3,3] [[node BroadcastGradientArgs_2 (defined at /usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_111903] Function call stack: distributed_function – jeffs Jan 27 '20 at 14:08
I see where it comes from. The result of your Dense Layer will be [1, 3] and a sample from Y_train is [1, 3, 3]. I don't know what you want to predict but since you need a 3x3 output you need to either use as last layer a `model.add(Dense(3*3, activation='softmax'))` followed by a `model.add(Reshape((3, 3)))` or a Conv2D layer but don't know if that could help you, anyway you need to use the same two layers that you defined in your NN – Orphee Faucoz Jan 27 '20 at 14:15
it worked very well with your first suggestion, thank you! – jeffs Jan 27 '20 at 14:36
Epoch 30/30 1901/1901 [==============================] - 3s 2ms/sample - loss: 29184.5207 - mse: 29184.5215 - val_loss: 2922.1123 - val_mse: 2922.1123 – jeffs Jan 27 '20 at 14:38
my loss is actually very high and the normal NN made better predictions for the same data set; do you know where this comes from? – jeffs Jan 27 '20 at 14:39
Can you replace `softmax` by `linear` in the activation of the Dense Layer? In your first net you used linear. you're here using here a `softmax` which is misleading in case of regression – Orphee Faucoz Jan 27 '20 at 14:44
Also, you may try : `model.compile( optimizer=tf.keras.optimizers.Adam(lr=1e-3), loss='mse', metrics=['mse'] )` – Orphee Faucoz Jan 27 '20 at 14:45
I already replaced softmax by relu in the dense layer and the loss I posted before is with the right activation function; now I tried with your next suggestion and I get this:1901/1901 [==============================] - 3s 1ms/sample - loss: 31126.4235 - mse: 31126.4219 - val_loss: 4396.4118 - val_mse: 4396.4116 – jeffs Jan 27 '20 at 14:53
And with `linear`? – Orphee Faucoz Jan 27 '20 at 14:54
but the good news is that the NN is actually performing! I will try out a few modifications in optimization and architecture to get better predictions; thank you for your patience! – jeffs Jan 27 '20 at 14:54
Yeah but I would say that Conv1D architectures could perform better – Orphee Faucoz Jan 27 '20 at 14:56
for all training procedures I was using conv1D; and this is with linear: 1901/1901 [==============================] - 2s 1ms/sample - loss: 41835.6214 - mse: 41835.6172 - val_loss: 8648.2552 - val_mse: 8648.2559 – jeffs Jan 27 '20 at 14:57
And did it converges? – Orphee Faucoz Jan 27 '20 at 14:58
when you mean that the loss is not changing much within a few epochs, yes, then it converged; the loss stabilizes at around loss: 40.000 – jeffs Jan 27 '20 at 15:04
Then I don't know, maybe ANN are better than CNN for your task, you can also try different architectures, have a good day anyway! – Orphee Faucoz Jan 27 '20 at 15:06
yeah, I will try out a few things.. thank you, have a nice day! – jeffs Jan 27 '20 at 15:14

Replace MLP with CNN

2 Answers2