5

I would like to ask you some help for creating my custom layer. What I am trying to do is actually quite simple: generating an output layer with 'stateful' variables, i.e. tensors whose value is updated at each batch.

In order to make everything more clear, here is a snippet of what I would like to do:

def call(self, inputs)

   c = self.constant
   m = self.extra_constant

   update = inputs*m + c 
   X_new = self.X_old + update 

   outputs = X_new

   self.X_old = X_new   

   return outputs

The idea here is quite simple:

  • X_old is initialized to 0 in the def__ init__(self, ...)
  • update is computed as a function of the inputs to the layer
  • the output of the layer is computed (i.e. X_new)
  • the value of X_old is set equal to X_new so that, at the next batch, X_old is no longer equal to zero but equal to X_new from the previous batch.

I have found out that K.update does the job, as shown in the example:

 X_new = K.update(self.X_old, self.X_old + update)

The problem here is that, if I then try to define the outputs of the layer as:

outputs = X_new

return outputs

I will receiver the following error when I try model.fit():

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have 
gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

And I keep having this error even though I imposed layer.trainable = False and I did not define any bias or weights for the layer. On the other hand, if I just do self.X_old = X_new, the value of X_old does not get updated.

Do you guys have a solution to implement this? I believe it should not be that hard, since also stateful RNN have a 'similar' functioning.

Thanks in advance for your help!

d_gg
  • 53
  • 5

1 Answers1

4

Defining a custom layer can become confusing some times. Some of the methods that you override are going to be called once but it gives you the impression that just like many other OO libraries/frameworks, they are going to be called many times.

Here is what I mean: When you define a layer and use it in a model the python code that you write for overriding call method is not going to be directly called in forward or backward passes. Instead, it's called only once when you call model.compile. It compiles the python code to a computational graph and that graph in which the tensors will flow is what does the computations during training and prediction.

That's why if you want to debug your model by putting a print statement it won't work; you need to use tf.print to add a print command to the graph.

It is the same situation with the state variable you want to have. Instead of simply assigning old + update to new you need to call a Keras function that adds that operation to the graph.

And note that tensors are immutable so you need to define the state as tf.Variable in the __init__ method.

So I believe this code is more like what you're looking for:

class CustomLayer(tf.keras.layers.Layer):
  def __init__(self, **kwargs):
    super(CustomLayer, self).__init__(**kwargs)
    self.state = tf.Variable(tf.zeros((3,3), 'float32'))
    self.constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
    self.extra_constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
    self.trainable = False

  def call(self, X):
    m = self.constant    
    c = self.extra_constant
    outputs = self.state + tf.matmul(X, m) + c
    tf.keras.backend.update(self.state, tf.reduce_sum(outputs, axis=0))

    return outputs
Mohammad Jafar Mashhadi
  • 4,102
  • 3
  • 29
  • 49
  • Hi Mohammad, I tried what you have suggested but it did not work for me actually. I don't know why, but tf.keras.backend.update() does not update the value of self.sate unless I assign this update to a new variable like X_tmp = tf.keras.backend.update(self.state, ...) and then I impose the output of the network to be equal to X_tmp (but then I get the ValueError). Thanks anyway for your reply! – d_gg Mar 09 '20 at 10:21
  • Please check out this notebook that I made to test the code before posting an answer: https://colab.research.google.com/gist/MJafarMashhadi/7fe9e90e615ab6fa749e60555a92de34/sotest.ipynb It works without assigning `update`'s return value to anything else – Mohammad Jafar Mashhadi Mar 10 '20 at 02:50
  • 1
    My bad, it actually looks like it's working. One more question: How do I reset the value of self.state back to 0 at the end of each epoch? Otherwise its value would just keep increasing during the training. Thank you in advance! – d_gg Mar 10 '20 at 13:21
  • That'd be a new question. AFAIK there are training callbacks you can use on your fit function, one of them is `on_epoch_end`. You have access to the model and all its layers there, I haven't tested it but you might be able to reset it to zero there – Mohammad Jafar Mashhadi Mar 10 '20 at 19:05