How to make a Keras Dense Layer deal with 3D tensor as input for this Softmax Fully Connected Layer?

Question

I am working on a custom problem, and i have to change the fully connected layer (Dense with softmax), My model code is something like this (with Keras Framework):

.......
batch_size = 8
inputs = tf.random.uniform(shape=[batch_size,1024,256],dtype=tf.dtypes.float32)
preds = Dense(num_classes,activation='softmax')(x) #final layer with softmax activation
....
model = Model(inputs=base_model.input,outputs=preds)

So, i have to change the Code of Dense Layer to output a Tensor of probabilities with the shape of [batch_size, 1024, num_classes], without using a for loop, i need it to be optimized and not a consuming time function

The Dense code version that i want to change:

class Dense(Layer):
"""Just your regular densely-connected NN layer.

`Dense` implements the operation:
`output = activation(dot(input, kernel) + bias)`
where `activation` is the element-wise activation function
passed as the `activation` argument, `kernel` is a weights matrix
created by the layer, and `bias` is a bias vector created by the layer
(only applicable if `use_bias` is `True`).

Note: if the input to the layer has a rank greater than 2, then
it is flattened prior to the initial dot product with `kernel`.

# Example

```python
    # as first layer in a sequential model:
    model = Sequential()
    model.add(Dense(32, input_shape=(16,)))
    # now the model will take as input arrays of shape (*, 16)
    # and output arrays of shape (*, 32)

    # after the first layer, you don't need to specify
    # the size of the input anymore:
    model.add(Dense(32))
```

# Arguments
    units: Positive integer, dimensionality of the output space.
    activation: Activation function to use
        (see [activations](../activations.md)).
        If you don't specify anything, no activation is applied
        (ie. "linear" activation: `a(x) = x`).
    use_bias: Boolean, whether the layer uses a bias vector.
    kernel_initializer: Initializer for the `kernel` weights matrix
        (see [initializers](../initializers.md)).
    bias_initializer: Initializer for the bias vector
        (see [initializers](../initializers.md)).
    kernel_regularizer: Regularizer function applied to
        the `kernel` weights matrix
        (see [regularizer](../regularizers.md)).
    bias_regularizer: Regularizer function applied to the bias vector
        (see [regularizer](../regularizers.md)).
    activity_regularizer: Regularizer function applied to
        the output of the layer (its "activation").
        (see [regularizer](../regularizers.md)).
    kernel_constraint: Constraint function applied to
        the `kernel` weights matrix
        (see [constraints](../constraints.md)).
    bias_constraint: Constraint function applied to the bias vector
        (see [constraints](../constraints.md)).

# Input shape
    nD tensor with shape: `(batch_size, ..., input_dim)`.
    The most common situation would be
    a 2D input with shape `(batch_size, input_dim)`.

# Output shape
    nD tensor with shape: `(batch_size, ..., units)`.
    For instance, for a 2D input with shape `(batch_size, input_dim)`,
    the output would have shape `(batch_size, units)`.
"""

def __init__(self, units,
             activation=None,
             use_bias=True,
             kernel_initializer='glorot_uniform',
             bias_initializer='zeros',
             kernel_regularizer=None,
             bias_regularizer=None,
             activity_regularizer=None,
             kernel_constraint=None,
             bias_constraint=None,
             **kwargs):
    if 'input_shape' not in kwargs and 'input_dim' in kwargs:
        kwargs['input_shape'] = (kwargs.pop('input_dim'),)
    super(Dense, self).__init__(**kwargs)
    self.units = units
    self.activation = activations.get(activation)
    self.use_bias = use_bias
    self.kernel_initializer = initializers.get(kernel_initializer)
    self.bias_initializer = initializers.get(bias_initializer)
    self.kernel_regularizer = regularizers.get(kernel_regularizer)
    self.bias_regularizer = regularizers.get(bias_regularizer)
    self.activity_regularizer = regularizers.get(activity_regularizer)
    self.kernel_constraint = constraints.get(kernel_constraint)
    self.bias_constraint = constraints.get(bias_constraint)
    self.input_spec = InputSpec(min_ndim=2)
    self.supports_masking = True

def build(self, input_shape):
    assert len(input_shape) >= 2 
    input_dim = input_shape[-1]  

    self.kernel = self.add_weight(shape=(input_dim, self.units),
                                  initializer=self.kernel_initializer,
                                  name='kernel',
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)
    if self.use_bias:
        self.bias = self.add_weight(shape=(self.units,),
                                    initializer=self.bias_initializer,
                                    name='bias',
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
    else:
        self.bias = None
    self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
    self.built = True

def call(self, inputs):
    output = K.dot(inputs, self.kernel)
    if self.use_bias:
        output = K.bias_add(output, self.bias)
    if self.activation is not None:
        output = self.activation(output)
    return output

def compute_output_shape(self, input_shape):
    assert input_shape and len(input_shape) >= 2
    assert input_shape[-1]
    output_shape = list(input_shape)
    output_shape[-1] = self.units
    return tuple(output_shape)

def get_config(self):
    config = {
        'units': self.units,
        'activation': activations.serialize(self.activation),
        'use_bias': self.use_bias,
        'kernel_initializer': initializers.serialize(self.kernel_initializer),
        'bias_initializer': initializers.serialize(self.bias_initializer),
        'kernel_regularizer': regularizers.serialize(self.kernel_regularizer),
        'bias_regularizer': regularizers.serialize(self.bias_regularizer),
        'activity_regularizer': regularizers.serialize(self.activity_regularizer),
        'kernel_constraint': constraints.serialize(self.kernel_constraint),
        'bias_constraint': constraints.serialize(self.bias_constraint)
    }
    base_config = super(Dense, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

What is 1024? And besides, are you aware that the `Dense` layer [is applied on the last axis by default](https://stackoverflow.com/a/52092176/2099607)? — today, Aug 20 '20 at 15:10
I want a single layer with [1024,256] as inputs shape which will output [1024, num_classes] as outputs. PS. I deleted the batch size dimension to make clearer. Also, i corrected the shapes orders. — Adelovi, Aug 20 '20 at 16:01
How do you plan to train it ? do you have ur labels in shape [batch_size, 1024, num_classes] — mujjiga, Aug 20 '20 at 18:20
@RaniaRano Then why don't you use the `Dense` layer itself? What's wrong with it? — today, Aug 20 '20 at 18:50
@today @@mujjiga There something to do later of course this Dense it's not the whole idea, i have change a lot things — Adelovi, Aug 21 '20 at 23:48

jdehesa · Accepted Answer · 2020-08-21T10:36:57.263

There are three different ways in which this can be done (that I can think of). If you want to have a single dense layer, that maps a vector of 256 elements to a vector of num_classes elements, and apply it all across your batch of data (that is, use the same 256 x num_classes matrix of weights for every sample), then you don't need to do anything special, just use a regular Dense layer:

import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = Dense(num_classes, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2570

Another way would be to have a single huge Dense layer that takes all 1024 * 256 values in at the same time and produces all 1024 * num_classes values at the output, that is, a layer with a matrix of weights with shape (1024 * 256) x (1024 * num_classes) (in the order if gigabytes of memory!). This is easy to do too, although it seems unlikely to be what you need:

import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Flatten, Dense, Reshape, Softmax

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
res = Flatten()(inp)
# This takes _a lot_ of memory!
layer = Dense(1024 * num_classes, activation=None)
out_res = layer(res)
# Apply softmax after reshaping
out_preact = Reshape((-1, num_classes))(out_res)
out = Softmax()(out_preact)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2684364800

Finally, you may want to have a set of 1024 weight matrices, each one applied to the corresponding sample in the input, which would imply an array of weights with shape (1024, 256, num_classes). I don't think this can be done with one of the standard Keras layers (or don't know how to)¹, but it's easy enough to write a custom layer based on Dense to do that:

import tensorflow as tf
from tensorflow.keras.layers import Dense, InputSpec

class Dense2D(Dense):
    def __init__(self, *args, **kwargs):
        super(Dense2D, self).__init__(*args, **kwargs)

    def build(self, input_shape):
        assert len(input_shape) >= 3
        input_dim1 = input_shape[-2]
        input_dim2 = input_shape[-1]

        self.kernel = self.add_weight(shape=(input_dim1, input_dim2, self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(input_dim1, self.units),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None
        self.input_spec = InputSpec(min_ndim=3, axes={-2: input_dim1, -1: input_dim2})
        self.built = True

    def call(self, inputs):
        # Multiply each set of weights with each input element
        output = tf.einsum('...ij,ijk->...ik', inputs, self.kernel)
        if self.use_bias:
            output += self.bias
        if self.activation is not None:
            output = self.activation(output)
        return output

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 3
        assert input_shape[-1]
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

You would then use it like this:

import tensorflow as tf
from tensorflow.keras import Input

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = Dense2D(num_classes, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2631680

¹: As today points out in the comments, you can actually use a LocallyConnected1D layer to do the same that I tried to do with my Dense2D layer. It is as simple as this:

import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import LocallyConnected1D

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = LocallyConnected1D(num_classes, 1, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2631680

The third approach could be done using a [`LocallyConnected1D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LocallyConnected1D) layer with kernel size of 1 and `num_classes` filters. No need for a custom layer. — today, Aug 20 '20 at 18:56
@today Ah, thought there had to be something for that, but could only think of separable convolution, which is not useful here. Maybe you can add that as an answer. — jdehesa, Aug 20 '20 at 23:45
Feel free to edit your answer to add that, if you'd like; it isn't worth writing a new answer ;) — today, Aug 21 '20 at 10:07
@jdehesa @today Great, Dense2D is what i am looking for, but there is only one thing, is there is a way to pass 1024 tensor on the same weights ? in the code you declared this : `shape=(input_dim1, input_dim2, self.units)` but i only one Dense weights and pass them all to all 1024 tensors, is possible without loops ? — Adelovi, Aug 21 '20 at 23:07
@RaniaRano I'm not sure I understand, isn't that like the first of the three cases I put? — jdehesa, Aug 22 '20 at 00:16
@jdehesa it's the Dense2D one, but you created weights with shape `[input_dim1, input_dim2]` right to multiply it with [1024, 256], but me i need one vector of same weights for all [[1024, 256], i mean a shared weights, this is why @today's solution (LocallyConnected1D) doesn't fit to my problem — Adelovi, Aug 22 '20 at 00:26
@RaniaRano So, you mean you want to have only 256 weight matrices, then repeat it 1024 times and apply it to the input with shape `(None, 1024, 256)`, that is what you mean? But that is exactly what the first solution does. It makes a matrix of weights for 256 elements, then broadcasts it across the other dimensions, so every one of the 1024 positions will be using the same weights. I'm sorry if you mean something else, but I'm not understanding the difference. — jdehesa, Aug 22 '20 at 07:53
@RaniaRano Please do yourself and us a favor and read [this answer](https://stackoverflow.com/a/52092176/2099607) (if you haven't already) to understand how Dense layer is applied on its input. Then if it didn't fit your requirements, come back and discuss this further. — today, Aug 22 '20 at 08:25
@jdehesa @today My last question, how can i make this line work on TF1.X ? `output = tf.gather_nd(input, [(i,j) for (i,j) in enumerate(z)])` — Adelovi, Aug 23 '20 at 18:06
@RaniaRano Maybe you could open a new question for that with more details, but assuming `z` is a 1D integer tensor, I suppose it would be something like `output = tf.gather_nd(input, tf.stack([tf.range(tf.size(z, out_type=z.dtype)), z], axis=1))`. — jdehesa, Aug 24 '20 at 09:18
Z is a 2D tensor. I already posted a question [here](https://stackoverflow.com/questions/63537430/the-asterisk-in-tf-gather-nd-in-python2-7-rise-syntax-error) — Adelovi, Aug 24 '20 at 09:41

How to make a Keras Dense Layer deal with 3D tensor as input for this Softmax Fully Connected Layer?

1 Answers1