3

I build my model using tf.keras.layers.Dense. In the first layer of my model I want some weights to be constant Zero. As in the gradient calculation these weights should be get a gradient = zero (as the last term in the chain rule corresponds to the weight, which is 0 for a constant). This is my approach so far:

import tensorflow as tf
import tensorflow.contrib.eager as tfe    
import numpy as np

tf.enable_eager_execution()

model = tf.keras.Sequential([
  tf.keras.layers.Dense(2, activation=tf.sigmoid, input_shape=(2,)),
  tf.keras.layers.Dense(2, activation=tf.sigmoid)
])

weights=[np.array([[tf.constant(0), 0.25],[0.2,0.3]]),np.array([0.35,0.35]),np.array([[0.4,0.5],[0.45, 0.55]]),np.array([0.6,0.6])]
model.set_weights(weights)
def loss(model, x, y):
  y_ = model(x)
  return tf.losses.mean_squared_error(labels=y, predictions=y_)

    def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return loss_value, tape.gradient(loss_value, model.trainable_variables) 

But in the gradient calculation the weight tf.constant(0) has a gradient not equal zero. Do I have an understanding problem?

How can I set a weight(or some weights) in a layer(not all weights in one layer) to a constant value (which should not change during training)?

rvinas
  • 11,824
  • 36
  • 58
Ev4
  • 129
  • 2
  • 11
  • You might find [this](https://stackoverflow.com/questions/50290769/specify-connections-in-nn-in-keras?answertab=votes#tab-top) useful. – rvinas Jan 09 '19 at 15:13
  • Do I understand the apporach in your link right that they set the weights to zero by multiplying with a matrix (0 is weight should be zero, 1 otherwise)? If that is the case how do I prevent these weights from learning? Or do I have to set the weights after every optimization step back to zero? – Ev4 Jan 10 '19 at 07:27
  • That's right! In this case, the gradient of the loss wrt. those will be 0 (when you apply the chain rule, the last factor is 0), so these weights won't be updated. You do *not* need to set these weights to zero after each training step. – rvinas Jan 10 '19 at 15:32
  • @ rvinas When I implement this approach the gradient calculation for the weights which are zero ist not zero. I looked at the gradient calculation and in my understanding the last factor of the chain rule is equal to the input. As the last factor in the chain rule is (d net_ j)/(d weight_ij) with net_j =...+ weight_ij * input_i +... and the result is (d net_ j)/(d weight_ij) = input_i. Do I understand this wrong? Did you implement the approach with the mask matrix and get 0 as a gradient for the zero weights ? – Ev4 Jan 11 '19 at 08:09
  • The gradient for those weights should be zero. When you multiply the weight's matrix by the connection matrix then net_j is actually net_j =...+ weight_ij * connection_ij * input_i +..., so (d net_ j)/(d weight_ij) will be zero if connection_ij = 0. – rvinas Jan 11 '19 at 09:48
  • Please see the example from my answer – rvinas Jan 11 '19 at 10:15

1 Answers1

3

My answer is based on the CustomConnected layer from this answer. As I said in a comment, when you multiply a weight w_ij by c_ij=0 via the connections matrix, the gradient of the loss with respect to that weight becomes zero as well (since the last factor in the chain rule corresponds to c_ij=0).

Here is a minimal example in Keras:

# Using CustomConnected from:
# https://stackoverflow.com/questions/50290769/specify-connections-in-nn-in-keras  
import tensorflow as tf
import numpy as np

tf.enable_eager_execution()

# Define model
inp = tf.keras.layers.Input(shape=(2,))
c = np.array([[1., 1.], [1., 0.]], dtype=np.float32)
h = CustomConnected(2, c)(inp)
model = tf.keras.models.Model(inp, h)

# Set initial weights and compile
w = [np.random.rand(2, 2) * c]
model.set_weights(w)
model.compile(tf.train.AdamOptimizer(), 'mse')

# Check gradients
x = tf.constant(np.random.rand(10, 2), dtype=tf.float32)
y = np.random.rand(10, 2)

with tf.GradientTape() as tape:
    loss_value = tf.losses.mean_squared_error(labels=y, predictions=model(x))
    grad = tape.gradient(loss_value, model.trainable_variables)
    print('Gradients: ', grad[0])

Note that I set c[1,1]=0 so the gradient corresponding to weight w[1,1] is 0 regardless of the input.

rvinas
  • 11,824
  • 36
  • 58
  • 1
    Thanks for your clear explanation and your minimal example. You helped me a lot to fix my problem! – Ev4 Jan 11 '19 at 13:13