Tensorflow 2.0 Keras BatchNorm: how to update the online params in custom training?

Question

How to train the batch norm layer without using any keras.compile methods? Typically layers have losses that are accessible. Here the losses method is empty.

UPDATE:

It seems like there is a lot of confusion about this and even the way the BatchNorm is implemented is pretty confused.

First, there is only on way to train the online parameters (use in training=False mode) to scale and shift the features: call the layer in training=True mode. And if you NEVER want to use the "batch" part of the batch normalization (i.e. you just want an online normalizer that trains itself with a Normal log-prob loss, you basically can't do this in a single call AFAIK.

Calling the layer with training=False does not update the params. Calling it with training=True udpates the params but then you get the batch normed layer (does not use the online loc and scale).

import tensorflow as tf

class Model(tf.keras.models.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(4)
        self.bn = tf.keras.layers.BatchNormalization()
    def call(self, x, training=False):
        x = self.dense(x)
        x = self.bn(x, training=training)
        return x

model = Model()    
x = 10 * np.random.randn(30, 4).astype(np.float32)

print(tf.math.reduce_std(model(x)))
tf.keras.backend.set_learning_phase(1)
print(tf.math.reduce_std(model(x)))
print(tf.math.reduce_std(model(x)))
tf.keras.backend.set_learning_phase(0)
print(tf.math.reduce_std(model(x)))
print(tf.math.reduce_std(model(x)))


tf.Tensor(9.504262, shape=(), dtype=float32)
tf.Tensor(0.99999136, shape=(), dtype=float32)
tf.Tensor(0.99999136, shape=(), dtype=float32)
tf.Tensor(5.4472375, shape=(), dtype=float32)
tf.Tensor(5.4472375, shape=(), dtype=float32)

UPDATE:

Showing keras layers have losses sometimes (when subtasks exist like regulatization):

In [335]: l = tf.keras.layers.Dense(8, kernel_regularizer=tf.keras.regularizers.L1L2())

In [336]: l(np.random.randn(2, 4))

Out[336]:
<tf.Tensor: id=2521999, shape=(2, 8), dtype=float32, numpy=
array([[ 1.1332406 ,  0.32000083,  0.8104123 ,  0.5066328 ,  0.35904446, -1.4265257 ,  1.3057183 ,  0.34458983],
       [-0.23246719, -0.46841025,  0.9706465 ,  0.42356712,  1.705613  , -0.08619405, -0.5261058 , -1.1696107 ]], dtype=float32)>

In [337]: l.losses
Out[337]: [<tf.Tensor: id=2522000, shape=(), dtype=float32, numpy=0.0>]

In [338]: l = tf.keras.layers.Dense(8)

In [339]: l(np.random.randn(2, 4))

Out[339]:
<tf.Tensor: id=2522028, shape=(2, 8), dtype=float32, numpy=
array([[ 1.0674231 , -0.13423748,  0.01775402,  2.5400681 , -0.53589094,  1.4460006 , -1.7197075 ,  0.3285858 ],
       [ 2.2171447 , -1.7448915 ,  0.4758569 ,  0.58695656,  0.32054698,  0.7813705 , -2.3022552 ,  0.44061095]], dtype=float32)>

In [340]: l.losses
Out[340]: []

Keras layers have never had losses, so no idea of what concept you are talking about, can you clarify? — Dr. Snoopy, Oct 02 '19 at 11:54
Keras layers have losses. Will paste simple code to show you in the udpate. — mathtick, Oct 02 '19 at 13:18
Batch norm has a loss that must be trained ... i.e. the training=False case where learned params are used to loc shift and scale the features. If it trained via a side-channel online update that should be explicit in the docs. — mathtick, Oct 02 '19 at 13:25
No, Batch norm does not have a loss at all. What you are probably looking for is the update op for the batch norm population mean and std — Dr. Snoopy, Oct 02 '19 at 13:26
Ah ok, yes that is what I am looking for. Seems weird they implement an "op" for the layer loss. For example, if you implement this kind of thing with a bijector you would probably use a loss but maybe batch norm came before all of that refactoring? — mathtick, Oct 02 '19 at 14:19
Also, what is the update op that we should be calling in tf2.0 using no compiled keras model? Reading through the code as it doesn't seem obvious from the docs. — mathtick, Oct 02 '19 at 14:29
@MatiasValdenegro do you by any chance know where this op is to update the params used in test? Using a vanilla Dense layer might make more sense in the end I guess. — mathtick, Oct 02 '19 at 20:20
This is basically the reason: https://github.com/tensorflow/tensorflow/issues/23873 — mathtick, Oct 03 '19 at 09:13
And this: https://pgaleone.eu/tensorflow/keras/2019/01/19/keras-not-yet-interface-to-tensorflow/ — mathtick, Oct 03 '19 at 09:25

score 0 · Answer 1 · answered Oct 02 '19 at 13:57

0

BatchNorm does train, but does not have a loss. It just tracks the mean and std of consecutive Batches in a weighted, moving Average. There is no loss/Gradient involved.

answered Oct 02 '19 at 13:57

ben

1,380
9
14

1

The online update is conceptually a gradient for a loss. We should get the docs updated to explicitly state that this layer is not "keras loss" based and implements it's own subtask update via a custom update op. I think this becomes particularly unnatural for those of us coming from TF2 and TFP (bijectors etc). For example, RealNVP is very similar in structure to a batch norm in training=False mode. – mathtick Oct 02 '19 at 14:21
See here: https://pgaleone.eu/tensorflow/keras/2019/01/19/keras-not-yet-interface-to-tensorflow/ – mathtick Oct 03 '19 at 09:25

Tensorflow 2.0 Keras BatchNorm: how to update the online params in custom training?

1 Answers1