14

How to average weights in Keras models, when I train few models with the same architecture with different initialisations?

Now my code looks something like this?

datagen = ImageDataGenerator(rotation_range=15,
                             width_shift_range=2.0/28,
                             height_shift_range=2.0/28
                            )

epochs = 40 
lr = (1.234e-3)
optimizer = Adam(lr=lr)

main_input = Input(shape= (28,28,1), name='main_input')

sub_models = []

for i in range(5):

    x = Conv2D(32, kernel_size=(3,3), strides=1)(main_input)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    x = Flatten()(x)

    x = Dense(1024)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.1)(x)

    x = Dense(256)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.4)(x)

    x = Dense(10, activation='softmax')(x)

    sub_models.append(x)

x = keras.layers.average(sub_models)

main_output = keras.layers.average(sub_models)

model = Model(inputs=[main_input], outputs=[main_output])

model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
              optimizer=optimizer)

print(model.summary())

plot_model(model, to_file='model.png')

filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
tensorboard = TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
callbacks = [checkpoint, tensorboard]

model.fit_generator(datagen.flow(X_train, y_train, batch_size=128),
                    steps_per_epoch=len(X_train) / 128,
                    epochs=epochs,
                    callbacks=callbacks,
                    verbose=1,
                    validation_data=(X_test, y_test))

So now I average only last layer, but I want to average weights in all layers after training each one separately.

Thanks!

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • You simply cannot average the weights of neural networks. – Dr. Snoopy Jan 11 '18 at 16:54
  • What have you tried so far? What if you call `keras.layers.average()` between each layer? – DarkCygnus Jan 11 '18 at 17:11
  • Don't want to average between each layer because I want to train each models separately. In case averaging after each layer it's something different. Same is when I average models in last layer before training, that is also different. – Miłosz Bednarzak Jan 12 '18 at 17:27
  • @MatiasValdenegro yes you can: https://arxiv.org/abs/1803.05407 – Scratch Jul 02 '18 at 10:03
  • 1
    @Scratch The paper doesn't support the idea that is asked in this question, its about averaging over SGD trajectories, and it appeared after this question was asked. – Dr. Snoopy Jul 02 '18 at 11:50
  • True. Averaging weights from models trained with different initialisations would make little sense. I just wanted to point out that averaging weights can be of interest in some specific cases. – Scratch Jul 06 '18 at 09:07

2 Answers2

18

So let's assume that models is a collection of your models. First - collect all weights:

weights = [model.get_weights() for model in models]

Now - create a new averaged weights:

new_weights = list()

for weights_list_tuple in zip(*weights):
    new_weights.append(
        [numpy.array(weights_).mean(axis=0)\
            for weights_ in zip(*weights_list_tuple)])

And what is left is to set these weights in a new model:

new_model.set_weights(new_weights)

Of course - averaging weights might be a bad idea, but in case you try - you should follow this approach.

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • 3
    Why is that a bad idea? I was inspired by http://cs231n.github.io/neural-networks-3/#ensemble Where it is said that it's a good idea ;) – Miłosz Bednarzak Jan 12 '18 at 17:28
  • 2
    Just to give you one example why this might could go wrong - take a model and permute all filters in a consistent manner. The network will be mathematically equivalent - but the average could differ a lot from the original function. And I'm not claiming that this is bad idea - I claim that it might ;) – Marcin Możejko Jan 12 '18 at 17:31
  • I have another issue. I get: 'NoneType' object has no attribute 'evaluate' I found that it is connected to fit_generator, but don't know how to fix this, can you help? Thanks! – Miłosz Bednarzak Jan 12 '18 at 20:06
  • https://github.com/miloszbednarzak/mnist/blob/master/mnist_averaged.ipynb – Miłosz Bednarzak Jan 13 '18 at 00:05
  • Change this line `new_model = model.set_weights(new_weights)` to ` `model.set_weights(new_weights)` – Marcin Możejko Jan 14 '18 at 22:44
  • I made an implementation of that paper: https://github.com/simon-larsson/keras-swa – Simon Larsson Oct 04 '19 at 15:36
  • Great answer, I found that `sum(weights_) / len(weights_)` instead of `numpy.array(weights_).mean(axis=0)` speeds the function (`36 ms` instead of `91ms` with a 4 layer ANN and 1000 hidden neurons per layer). Is there a way to further improve it? I tried with multiprocessing but no luck so far – maurock Feb 12 '20 at 19:50
  • This is the approach that can be used for K-fold cross validation, right? – Mike de Klerk Mar 07 '20 at 14:00
  • It's a good idea to average weights taken from same model during different epochs(usually few last epochs). It's a bad idea to average weights of different models(trained separately). – Serhiy Jun 08 '20 at 10:22
  • The error I get here is: `TypeError: cannot perform reduce with flexible type` – Koti Apr 27 '22 at 22:47
10

I can't comment on the accepted answer, but to make it work on tensorflow 2.0 with tf.keras I had to make the list in the loop into a numpy array:

new_weights = list()
for weights_list_tuple in zip(*weights): 
    new_weights.append(
        np.array([np.array(w).mean(axis=0) for w in zip(*weights_list_tuple)])
    )

If different input models need to be weighted differently, np.array(w).mean(axis=0) needs to be replaced with np.average(np.array(w),axis=0, weights=relative_weights) where relative_weights is an array with a weight factor for each model.

ursusminimus
  • 148
  • 1
  • 10
  • I get an `TypeError: zip argument #5 must support iteration`. Why is this happening? – Koti Apr 27 '22 at 21:52