2

I want to implement a classifier with a sparse input layer. My data has about 60 dimenions and I want to check for feature importance. To do I this want the first layer to have a diagonal weight matrix(to which I want to apply a L1 kernel regularizer ), all off diagonals should be non trainable zeros. So a one to one connection per input channel, a Dense layer would mix the input variables. I checked Specify connections in NN (in keras) and Custom connections between layers Keras. The latter one I could not use as Lambda layers do not introduce trainable weights.

Something like this however does not affect the actual weight matrix:

class MyLayer(Layer):
def __init__(self, output_dim,connection, **kwargs):
    self.output_dim = output_dim
    self.connection=connection
    super(MyLayer, self).__init__(**kwargs)

def build(self, input_shape):
    # Create a trainable weight variable for this layer.
    self.kernel = self.add_weight(name='kernel', 
                                  shape=(input_shape[1], self.output_dim),
                                  initializer='uniform',
                                  trainable=True)
    self.kernel=tf.linalg.tensor_diag_part(self.kernel)
    self.kernel=tf.linalg.tensor_diag(self.kernel)
    super(MyLayer, self).build(input_shape)  # Be sure to call this at the end

def call(self, x):
    return K.dot(x, self.kernel)

def compute_output_shape(self, input_shape):
    return (input_shape[0], self.output_dim)

When I train the model and print the weights, I do not get a diagonal matrix for the first layer.

What am I doing wrong?

jdehesa
  • 58,456
  • 7
  • 77
  • 121
sim
  • 23
  • 4
  • 1
    The weights should be just the diagonal elements of the matrix (a 1D vector), which you would use to build the actual weights matrix. – jdehesa Dec 12 '18 at 13:55
  • I think it's probably working, but you're not "eliminating" the original weight matrix. It still exists. The only way to confirm this is working is to print the weights before and after training and see that only the desired part is getting updated. – Daniel Möller Dec 12 '18 at 14:32
  • You are right, it was just not visible through simple printing the weights. Thanks – sim Dec 14 '18 at 12:54

1 Answers1

4

Not quite sure what you want to do exactly, because, for me, diagonal is something for a square matrix, implying your layer input and output dimensionality should be unchanged.

Anyway, let's talk about the square matrix case first. I think there are two ways of implementing a weight matrix with all zeros values off the diagonal.

Method 1: only conceptually follow the square matrix idea, and implement this layer with a trainable weight vector as follows.

# instead of writing y = K.dot(x,W), 
# where W is the weight NxN matrix with zero values of the diagonal.
# write y = x * w, where w is the weight vector 1xN

Method 2: use the default Dense layer, but with your own constraint.

# all you need to create a mask matrix M, which is a NxN identity matrix
# and you can write a contraint like below
class DiagonalWeight(Constraint):
    """Constrains the weights to be diagonal.
    """
    def __call__(self, w):
        N = K.int_shape(w)[-1]
        m = K.eye(N)
        w *= m
        return w

Of course, you should use Dense( ..., kernel_constraint=DiagonalWeight()).

pitfall
  • 2,531
  • 1
  • 21
  • 21
  • 1
    Yeah I want to use a square weight matrix. Your suggestion is much easier than my custom layer approach, thanks. – sim Dec 14 '18 at 13:58
  • Using TensorFlow 2, I ran into `ValueError: tf.function-decorated function tried to create variables on non-first call.`. I fixed it by creating the mask when the `DiagonalWeight` instance is first instantiated in the `__init__`. – Luke Jan 25 '21 at 00:22