Average weights in keras models

Question

How to average weights in Keras models, when I train few models with the same architecture with different initialisations?

Now my code looks something like this?

datagen = ImageDataGenerator(rotation_range=15,
                             width_shift_range=2.0/28,
                             height_shift_range=2.0/28
                            )

epochs = 40 
lr = (1.234e-3)
optimizer = Adam(lr=lr)

main_input = Input(shape= (28,28,1), name='main_input')

sub_models = []

for i in range(5):

    x = Conv2D(32, kernel_size=(3,3), strides=1)(main_input)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    x = Flatten()(x)

    x = Dense(1024)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.1)(x)

    x = Dense(256)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.4)(x)

    x = Dense(10, activation='softmax')(x)

    sub_models.append(x)

x = keras.layers.average(sub_models)

main_output = keras.layers.average(sub_models)

model = Model(inputs=[main_input], outputs=[main_output])

model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
              optimizer=optimizer)

print(model.summary())

plot_model(model, to_file='model.png')

filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
tensorboard = TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
callbacks = [checkpoint, tensorboard]

model.fit_generator(datagen.flow(X_train, y_train, batch_size=128),
                    steps_per_epoch=len(X_train) / 128,
                    epochs=epochs,
                    callbacks=callbacks,
                    verbose=1,
                    validation_data=(X_test, y_test))

So now I average only last layer, but I want to average weights in all layers after training each one separately.

Thanks!

What have you tried so far? What if you call `keras.layers.average()` between each layer? — DarkCygnus, Jan 11 '18 at 17:11
Don't want to average between each layer because I want to train each models separately. In case averaging after each layer it's something different. Same is when I average models in last layer before training, that is also different. — Miłosz Bednarzak, Jan 12 '18 at 17:27
@MatiasValdenegro yes you can: https://arxiv.org/abs/1803.05407 — Scratch, Jul 02 '18 at 10:03
@Scratch The paper doesn't support the idea that is asked in this question, its about averaging over SGD trajectories, and it appeared after this question was asked. — Dr. Snoopy, Jul 02 '18 at 11:50
True. Averaging weights from models trained with different initialisations would make little sense. I just wanted to point out that averaging weights can be of interest in some specific cases. — Scratch, Jul 06 '18 at 09:07

score 18 · Accepted Answer · answered Jan 11 '18 at 17:15

18

So let's assume that models is a collection of your models. First - collect all weights:

weights = [model.get_weights() for model in models]

Now - create a new averaged weights:

new_weights = list()

for weights_list_tuple in zip(*weights):
    new_weights.append(
        [numpy.array(weights_).mean(axis=0)\
            for weights_ in zip(*weights_list_tuple)])

And what is left is to set these weights in a new model:

new_model.set_weights(new_weights)

Of course - averaging weights might be a bad idea, but in case you try - you should follow this approach.

answered Jan 11 '18 at 17:15

Marcin Możejko

39,542
10
109
120

3

Why is that a bad idea? I was inspired by http://cs231n.github.io/neural-networks-3/#ensemble Where it is said that it's a good idea ;) – Miłosz Bednarzak Jan 12 '18 at 17:28
2

Just to give you one example why this might could go wrong - take a model and permute all filters in a consistent manner. The network will be mathematically equivalent - but the average could differ a lot from the original function. And I'm not claiming that this is bad idea - I claim that it might ;) – Marcin Możejko Jan 12 '18 at 17:31
I have another issue. I get: 'NoneType' object has no attribute 'evaluate' I found that it is connected to fit_generator, but don't know how to fix this, can you help? Thanks! – Miłosz Bednarzak Jan 12 '18 at 20:06
https://github.com/miloszbednarzak/mnist/blob/master/mnist_averaged.ipynb – Miłosz Bednarzak Jan 13 '18 at 00:05
Change this line `new_model = model.set_weights(new_weights)` to ` `model.set_weights(new_weights)` – Marcin Możejko Jan 14 '18 at 22:44
I made an implementation of that paper: https://github.com/simon-larsson/keras-swa – Simon Larsson Oct 04 '19 at 15:36
Great answer, I found that `sum(weights_) / len(weights_)` instead of `numpy.array(weights_).mean(axis=0)` speeds the function (`36 ms` instead of `91ms` with a 4 layer ANN and 1000 hidden neurons per layer). Is there a way to further improve it? I tried with multiprocessing but no luck so far – maurock Feb 12 '20 at 19:50
This is the approach that can be used for K-fold cross validation, right? – Mike de Klerk Mar 07 '20 at 14:00
It's a good idea to average weights taken from same model during different epochs(usually few last epochs). It's a bad idea to average weights of different models(trained separately). – Serhiy Jun 08 '20 at 10:22
The error I get here is: `TypeError: cannot perform reduce with flexible type` – Koti Apr 27 '22 at 22:47

score 10 · Answer 2 · answered Dec 13 '19 at 14:56

I can't comment on the accepted answer, but to make it work on tensorflow 2.0 with tf.keras I had to make the list in the loop into a numpy array:

new_weights = list()
for weights_list_tuple in zip(*weights): 
    new_weights.append(
        np.array([np.array(w).mean(axis=0) for w in zip(*weights_list_tuple)])
    )

If different input models need to be weighted differently, np.array(w).mean(axis=0) needs to be replaced with np.average(np.array(w),axis=0, weights=relative_weights) where relative_weights is an array with a weight factor for each model.

I get an `TypeError: zip argument #5 must support iteration`. Why is this happening? — Koti, Apr 27 '22 at 21:52

Average weights in keras models

2 Answers2

Linked