0

I want to do transfer learning with simple MLP models. First I train a 1 hidden layer feed forward network on large data:

net = Sequential()
net.add(Dense(500, input_dim=2048, kernel_initializer='normal', activation='relu'))
net.add(Dense(1, kernel_initializer='normal'))
net.compile(loss='mean_absolute_error', optimizer='adam')
net.fit(x_transf, 
        y_transf,
        epochs=1000, 
        batch_size=8, 
        verbose=0)

Then I want to pass the unique hidden layer as input to a new network, in which I want to add a second layer. The re-used layer should not be trainable.

idx = 1  # index of desired layer
input_shape = net.layers[idx].get_input_shape_at(0) # get the input shape of desired layer
input_layer = net.layers[idx]
input_layer.trainable = False

transf_model = Sequential()
transf_model.add(input_layer)
transf_model.add(Dense(input_shape[1], activation='relu'))
transf_model.compile(loss='mean_absolute_error', optimizer='adam')
transf_model.fit(x, 
                 y,
                 epochs=10, 
                 batch_size=8, 
                 verbose=0)

EDIT: The above code returns:

ValueError: Error when checking target: expected dense_9 to have shape (None, 500) but got array with shape (436, 1)

What's the trick to make this work?

tevang
  • 518
  • 1
  • 4
  • 17
  • The shared layer you used in the second model is expecting 2D inputs, but you are feeding the model with 3D inputs?! – today Jan 05 '19 at 19:55
  • Please, somebody? It must be fairly simple to answer for someone who is familiar with Keras. – tevang Jan 07 '19 at 10:26
  • Did you read my comment? – today Jan 07 '19 at 10:40
  • @today Read my EDIT. – tevang Jan 07 '19 at 13:00
  • The last `Dense` layer in the second model should have 1 unit, not `input_shape[0]` units, right? I think you are making it a little complicated. There are better ways of doing this. – today Jan 07 '19 at 13:06
  • Could you please write an example? What changes would you do to my sample code? – tevang Jan 07 '19 at 13:52
  • Sure, but could you explain what would be different in the second model? How many layers does it have? And what is the expected output shape of the model, is it `(None, 500)` or `(None, 1)`? – today Jan 07 '19 at 14:31
  • Just add an extra Dense layer with 800 neurons. The output should be (None, 1), I forgot to add this before "transf_model.compile(...)": transf_model.add(Dense(1, kernel_initializer='normal')) – tevang Jan 07 '19 at 14:33

1 Answers1

1

I would simply use Functional API to build such a model:

shared_layer = net.layers[0] # you want the first layer, so index = 0
shared_layer.trainable = False

inp = Input(the_shape_of_one_input_sample) # e.g. (2048,)
x = shared_layer(inp)
x = Dense(800, ...)(x)
out = Dense(1, ...)(x)

model = Model(inp, out)

# the rest is the same...
today
  • 32,602
  • 8
  • 95
  • 115
  • What if the 1st network had 2 hidden layers and I wanted to transfer both of them to the new network? – tevang Jan 07 '19 at 15:21
  • @tevang It is the same: you get the layers and apply them on tensors (i.e. output of previous layers). For example, `x = shared_layer1(inp)` and then `x = shared_layer2(x)`. – today Jan 07 '19 at 15:23
  • Yes, that's it! Thank you very much! – tevang Jan 07 '19 at 18:18