1

How to train the batch norm layer without using any keras.compile methods? Typically layers have losses that are accessible. Here the losses method is empty.

UPDATE:

It seems like there is a lot of confusion about this and even the way the BatchNorm is implemented is pretty confused.

First, there is only on way to train the online parameters (use in training=False mode) to scale and shift the features: call the layer in training=True mode. And if you NEVER want to use the "batch" part of the batch normalization (i.e. you just want an online normalizer that trains itself with a Normal log-prob loss, you basically can't do this in a single call AFAIK.

Calling the layer with training=False does not update the params. Calling it with training=True udpates the params but then you get the batch normed layer (does not use the online loc and scale).

import tensorflow as tf

class Model(tf.keras.models.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(4)
        self.bn = tf.keras.layers.BatchNormalization()
    def call(self, x, training=False):
        x = self.dense(x)
        x = self.bn(x, training=training)
        return x

model = Model()    
x = 10 * np.random.randn(30, 4).astype(np.float32)

print(tf.math.reduce_std(model(x)))
tf.keras.backend.set_learning_phase(1)
print(tf.math.reduce_std(model(x)))
print(tf.math.reduce_std(model(x)))
tf.keras.backend.set_learning_phase(0)
print(tf.math.reduce_std(model(x)))
print(tf.math.reduce_std(model(x)))


tf.Tensor(9.504262, shape=(), dtype=float32)
tf.Tensor(0.99999136, shape=(), dtype=float32)
tf.Tensor(0.99999136, shape=(), dtype=float32)
tf.Tensor(5.4472375, shape=(), dtype=float32)
tf.Tensor(5.4472375, shape=(), dtype=float32)

UPDATE:

Showing keras layers have losses sometimes (when subtasks exist like regulatization):

In [335]: l = tf.keras.layers.Dense(8, kernel_regularizer=tf.keras.regularizers.L1L2())

In [336]: l(np.random.randn(2, 4))

Out[336]:
<tf.Tensor: id=2521999, shape=(2, 8), dtype=float32, numpy=
array([[ 1.1332406 ,  0.32000083,  0.8104123 ,  0.5066328 ,  0.35904446, -1.4265257 ,  1.3057183 ,  0.34458983],
       [-0.23246719, -0.46841025,  0.9706465 ,  0.42356712,  1.705613  , -0.08619405, -0.5261058 , -1.1696107 ]], dtype=float32)>

In [337]: l.losses
Out[337]: [<tf.Tensor: id=2522000, shape=(), dtype=float32, numpy=0.0>]

In [338]: l = tf.keras.layers.Dense(8)

In [339]: l(np.random.randn(2, 4))

Out[339]:
<tf.Tensor: id=2522028, shape=(2, 8), dtype=float32, numpy=
array([[ 1.0674231 , -0.13423748,  0.01775402,  2.5400681 , -0.53589094,  1.4460006 , -1.7197075 ,  0.3285858 ],
       [ 2.2171447 , -1.7448915 ,  0.4758569 ,  0.58695656,  0.32054698,  0.7813705 , -2.3022552 ,  0.44061095]], dtype=float32)>

In [340]: l.losses
Out[340]: []
mathtick
  • 6,487
  • 13
  • 56
  • 101
  • Keras layers have never had losses, so no idea of what concept you are talking about, can you clarify? – Dr. Snoopy Oct 02 '19 at 11:54
  • 1
    Keras layers have losses. Will paste simple code to show you in the udpate. – mathtick Oct 02 '19 at 13:18
  • Batch norm has a loss that must be trained ... i.e. the training=False case where learned params are used to loc shift and scale the features. If it trained via a side-channel online update that should be explicit in the docs. – mathtick Oct 02 '19 at 13:25
  • 1
    No, Batch norm does not have a loss at all. What you are probably looking for is the update op for the batch norm population mean and std – Dr. Snoopy Oct 02 '19 at 13:26
  • Ah ok, yes that is what I am looking for. Seems weird they implement an "op" for the layer loss. For example, if you implement this kind of thing with a bijector you would probably use a loss but maybe batch norm came before all of that refactoring? – mathtick Oct 02 '19 at 14:19
  • Also, what is the update op that we should be calling in tf2.0 using no compiled keras model? Reading through the code as it doesn't seem obvious from the docs. – mathtick Oct 02 '19 at 14:29
  • @MatiasValdenegro do you by any chance know where this op is to update the params used in test? Using a vanilla Dense layer might make more sense in the end I guess. – mathtick Oct 02 '19 at 20:20
  • This is basically the reason: https://github.com/tensorflow/tensorflow/issues/23873 – mathtick Oct 03 '19 at 09:13
  • And this: https://pgaleone.eu/tensorflow/keras/2019/01/19/keras-not-yet-interface-to-tensorflow/ – mathtick Oct 03 '19 at 09:25

1 Answers1

0

BatchNorm does train, but does not have a loss. It just tracks the mean and std of consecutive Batches in a weighted, moving Average. There is no loss/Gradient involved.

ben
  • 1,380
  • 9
  • 14
  • 1
    The online update is conceptually a gradient for a loss. We should get the docs updated to explicitly state that this layer is not "keras loss" based and implements it's own subtask update via a custom update op. I think this becomes particularly unnatural for those of us coming from TF2 and TFP (bijectors etc). For example, RealNVP is very similar in structure to a batch norm in training=False mode. – mathtick Oct 02 '19 at 14:21
  • See here: https://pgaleone.eu/tensorflow/keras/2019/01/19/keras-not-yet-interface-to-tensorflow/ – mathtick Oct 03 '19 at 09:25