0

I am building a custom Keras layer that it is essentially the softmax function with a base parameter which is trainable. While the layer works on its own, when placed inside a sequential model, model.summary() determines its output shape as None and model.fit() raises a presumably linked exception:

ValueError: as_list() is not defined on an unknown TensorShape.

In other custom layers (including obviously the Linear example from keras) the output shape can be determined after .build() is called. By looking at model.summary()'s source code, as well as keras.layers.Layer, there is this @property Layer.output_shape that fails to automatically determine the output shape.

Then I tried overwriting the property and manually returning the input_shape argument passed to my layer's .build() method after saving it (softmax does not change the shape of the input), but this didn't work either: If i make a call to super().output_shape before returning my value, model.summary() determines the shape as ?, while if I don't, the value may be shown seemingly correct, but in both cases, I get the exact same error during .fit().

Is there something special about the code iside call() that prevents keras from understanding the shape of the output?
Alteratively, is there a piece of documentation I have missed?

My layer:

class B_Softmax(keras.layers.Layer):
    def __init__(self, b_init_mean=10, b_init_var=0.001):
        super(B_Softmax, self).__init__()
        self.b_init = tf.random_normal_initializer(b_init_mean, b_init_var)
        self._out_shape = None
        
    def build(self, input_shape):
        self.b = tf.Variable(
            initial_value = self.b_init(shape=(1,), dtype='float32'),
            trainable=True
        )
        self._out_shape = input_shape

    def call(self, inputs):
        # This is an implementation of Softmax for batched inputs
        # where the factor b is added to the exponents
        nominators  = tf.math.exp(self.b * inputs)
        denominator = tf.reduce_sum(nominators, axis=1)
        denominator = tf.squeeze(denominator)
        denominator = tf.expand_dims(denominator, -1)
        s           = tf.divide(nominators, denominator)
        return s

    @property
    def output_shape(self):    # If I comment out this function, summary prints 'None'
        self.output_shape      # If I leave this line, summary prints '?' 
        return self._out_shape # If the above line is commented out, summary prints '10' (correctly)
                               # but the same error is triggered in all three cases

The layer works on its own:

>>> A     = tf.constant([[1,2,3], [7,5,6]], dtype="float32")
>>> layer = B_Softmax(1.0)
>>> layer(A)
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.08991686, 0.24461554, 0.6654676 ],
       [0.6654677 , 0.08991687, 0.24461551]], dtype=float32)>

But when I try to include it inside a model, the summary doesn't look right:

input_dim = 5
model = keras.Sequential([
        Dense(32, activation='relu', input_shape=(input_dim,)),
        Dense(num_classes, activation="softmax"),
        B_Softmax(1.0)
])
model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_10 (Dense)            (None, 32)                192       
                                                                 
 dense_11 (Dense)            (None, 10)                330       
                                                                 
 b__softmax_18 (B_Softmax)   None  <-------------------1-------- "None", "?", or "10" (in a hacky way) may be printted           
                                                                 
=================================================================
Total params: 523
Trainable params: 523
Non-trainable params: 0

And training fails:

batch_size = 128
epochs = 15
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
ValueError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 894, in train_step
        return self.compute_metrics(x, y, y_pred, sample_weight)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 987, in compute_metrics
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 480, in update_state
        self.build(y_pred, y_true)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 398, in build
        y_pred)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 526, in _get_metric_objects
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 526, in <listcomp>
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 548, in _get_metric_object
        y_p_rank = len(y_p.shape.as_list())

    ValueError: as_list() is not defined on an unknown TensorShape.
kyriakosSt
  • 1,754
  • 2
  • 15
  • 33
  • Note about possible duplicate: [This question](https://stackoverflow.com/questions/49527159/how-to-get-the-output-shape-of-a-layer-in-keras) asks how to determine the output shape. This question is not about a *custom* layer that the usual way fails, and secondly, the accepted answer points to a documentation page for layer.output_shape, but the relevant information is clearly missing from that page. – kyriakosSt Nov 06 '22 at 21:08
  • 1
    I would assume that `tf.squeeze` causes issues here, as TF may not be able to determine how many dimnesions will be "squeezed away". Can't you just use `keepdims=True` in `reduce_sum` instead of squeze and expand? I assume you are doing this to correctly match the dimensions for broadcasting. – xdurch0 Nov 06 '22 at 21:43
  • 1
    @xdurch0 Oh God, you are *absolutely correct*!! In fact I can simply remove `squeeze()` and let tf's auto broadcasting rules work out the intended dimensions. This works perfectly! A prime example of (badly applied) defensive programming introducing bugs. At any case, if you would like, you can formulate your comment as an answer to accept. – kyriakosSt Nov 06 '22 at 22:02

3 Answers3

2

This doesn't directly solves the issue, rather side-stepping it: Instead of using squeeze and expand_dims, the former of which seems to be problematic for Tensorflow to keep track of, we use keepdims=True in the summation to keep the axes aligned correctly for the softmax denominator.

def call(self, inputs):
        # This is an implementation of Softmax for batched inputs
        # where the factor b is added to the exponents
        nominators  = tf.math.exp(self.b * inputs)
        denominator = tf.reduce_sum(nominators, axis=1, keepdims=True)
        s           = tf.divide(nominators, denominator)
        return s

Arguably, it would be much preferable to make use of the built-in softmax:

def call(self, inputs):
        return tf.nn.softmax(self.b * inputs)
xdurch0
  • 9,905
  • 4
  • 32
  • 38
  • Hi, thanks for formulating the answer. This works. In fact, `keepdims=True` is not even needed. About using tf's softmax, indeed I should have done that. When I started writing the layer I wanted to modify the softmax function a bit, that's why I started re-implementing it – kyriakosSt Nov 07 '22 at 13:09
1

You could implement the compute_output_shape method in your Layer subclass:

def compute_output_shape(self, input_shape):
    return [(None, out_shape)]

Where out_shape contains the dimensionality of the output, or you can replace the whole tuple to have any output shape you want.

Dr. Snoopy
  • 55,122
  • 7
  • 121
  • 140
  • Hi, thanks for this suggestion. I have found this to be equivalent to overriding the `output_shape` property. Indeed `summary()` displays 10 correctly but the error during training remains due to tf being unable to determine the effect of `squeeze()`. I think this might be a bug (or undocumented behavior) from tensorflow. – kyriakosSt Nov 07 '22 at 13:13
0

I found no problem with the codes, I think input parameters are important there are some remarks done this way:

  1. I use the dataset and the model.fit() I also use it without splits that is because I create a sample with one record.
  2. Model input, I modified from Input( 5, ) to Input ( 1, 5 ) that matches the dataset creates a shape ( that is why I added choice number 1 into the summary )
  3. Categorize, BatchSize, Number of class, LossFN, and Optimizers are adjusted by the output dimensions of the networks by number_classes parameters.

Sample: << Loss FN does not decisions the model but Input and Output does, not necessary tell how it created removed what it does not use or comment it >>

import tensorflow as tf

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Class / Definition
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class B_Softmax(tf.keras.layers.Layer):
    def __init__(self, b_init_mean=10, b_init_var=0.001):
        super(B_Softmax, self).__init__()
        self.b_init = tf.random_normal_initializer(b_init_mean, b_init_var)
        self._out_shape = None
        
    def build(self, input_shape):
        self.b = tf.Variable(
            initial_value = self.b_init(shape=(1,), dtype='float32'),
            trainable=True
        )
        self._out_shape = input_shape

    def call(self, inputs):
        # This is an implementation of Softmax for batched inputs
        # where the factor b is added to the exponents
        nominators  = tf.math.exp(self.b * inputs)
        denominator = tf.reduce_sum(nominators, axis=1)
        denominator = tf.squeeze(denominator)
        denominator = tf.expand_dims(denominator, -1)
        s           = tf.divide(nominators, denominator)
        return s

    # @property
    # def output_shape(self):    # If I comment out this function, summary prints 'None'
        # self.output_shape      # If I leave this line, summary prints '?' 
        # return self._out_shape # If the above line is commented out, summary prints '10' (correctly)
    #                           but the same error is triggered in all three cases

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""                              
A = tf.constant([[1,2,3], [7,5,6]], dtype="float32")

batch_size = 128
epochs = 15
input_dim = 5
num_classes = 1

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Dataset
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
start = 3
limit = 16
delta = 3
sample = tf.range( start, limit, delta )
sample = tf.cast( sample, dtype=tf.float32 )
sample = tf.constant( sample, shape=( 1, 1, 1, 5 ) )
dataset = tf.data.Dataset.from_tensor_slices(( sample, tf.constant( [0], shape=( 1, 1, 1, 1 ), dtype=tf.int64)))

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""   
layer = B_Softmax(1.0)
print( layer(A) )

model = tf.keras.Sequential([
        tf.keras.layers.Dense(32, activation='relu', input_shape=(1, input_dim)),
        tf.keras.layers.Dense(num_classes, activation="softmax"),
        B_Softmax(1.0)
])
model.summary()

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Working
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""   
model.fit(dataset, batch_size=batch_size, epochs=epochs, validation_data=dataset)

Output:

tf.Tensor(
[[0.09007736 0.24477491 0.6651477 ]
 [0.66514784 0.09007736 0.24477486]], shape=(2, 3), dtype=float32)
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               (None, 1, 32)             192

 dense_1 (Dense)             (None, 1, 1)              33

 b__softmax_1 (B_Softmax)    None                      1

=================================================================
Total params: 226
Trainable params: 226
Non-trainable params: 0
_________________________________________________________________
Epoch 1/15
1/1 [==============================] - 5s 5s/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 3/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 4/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 5/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 6/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 7/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 8/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 9/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 10/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 11/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 12/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 13/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 14/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 15/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00

C:\Python310>
  • This code literally reproduces the problem, of layer output having output shape None, seems again you did not understand the question. – Dr. Snoopy Nov 07 '22 at 00:23
  • The codes and explanation are correct, you can compare them with other models, softmax is a significants output it had 1 parameter output as the classification class required. – Jirayu Kaewprateep Nov 07 '22 at 05:10
  • Time significance V.S. scales significant, the time signature is multiplied by values return values determined answer by its method but scales significantly is different input and output from the question he needs to run the codes with some output, at the last layer for mapping label and prediction. – Jirayu Kaewprateep Nov 07 '22 at 05:17
  • None of that made any sense. – Dr. Snoopy Nov 07 '22 at 07:49
  • Hi, thanks for taking time to answer my question. I have no idea what the code with the generated data set is supposed to demonstrate, especially since its shape is (1,1,1,5) (which I cannot understand), its output is only one class (I suppose this is why the introduced by Softmax doesn't appear), and the output of the training shows no learning. Obviously, I cannot change the dataset to make the code run. But at any case, the problem was solved due to @durch0 's comment. – kyriakosSt Nov 07 '22 at 13:06
  • Hi, my answer telling you that the softmax at the last layer output is none parameters as its function you can compare to a standard model created, one parameter is output as it is designed. I see of his answer too that is squeezed dimension before feeding but for me see input shape is useful ( 1, 1, 32, 32, 3 ) or ( 1, 1, 5 ) that you do not need to convert later other is just input shape. – Jirayu Kaewprateep Nov 08 '22 at 13:00