3

I'm currently considering to implement the Self-Attention GAN in keras. The way I'm thinking to implement is as follows:

def Attention(X, channels):
    def hw_flatten(x):
        return np.reshape(x, (x.shape[0], -1, x.shape[-1]))

    f = Conv2D(channels//8, kernel_size=1, strides=1, padding='same')(X)  # [bs, h, w, c']
    g = Conv2D(channels//8, kernel_size=1, strides=1, padding='same')(X)  # [bs, h, w, c']
    h = Conv2D(channels, kernel_size=1, strides=1, padding='same')(X)  # [bs, h, w, c]

    # N = h * w
    flatten_g = hw_flatten(g)
    flatten_f = hw_flatten(f)
    s = np.matmul(flatten_g, flatten_f.reshape((flatten_f.shape[0], flatten_f.shape[-1], -1)))  # [bs, N, N]

    beta = softmax(s, axis=-1)  # attention map

    flatten_h = hw_flatten(h)   # [bs, N, C]
    o = np.matmul(beta, flatten_h)  # [bs, N, C]
    gamma = 0

    o = np.reshape(o, X.shape)  # [bs, h, w, C]
    y = gamma * o + X

    return y

But I have no idea how to add a trainable scalar gamma as described in the paper: SAGAN

I also hope someone can give some ideas about how to initialize a trainable keras scalar.


EDIT:

My implementation is now:

class Attention(Layer):
    def __init__(self, ch, **kwargs):
        super(Attention, self).__init__(**kwargs)
        self.channels = ch
        self.filters_f_g = self.channels // 8
        self.filters_h = self.channels

    def build(self, input_shape):
        kernel_shape_f_g = (1, 1) + (self.channels, self.filters_f_g)
        print(kernel_shape_f_g)
        kernel_shape_h = (1, 1) + (self.channels, self.filters_h)

        # Create a trainable weight variable for this layer:
        self.gamma = self.add_weight(name='gamma', shape=[1], initializer='zeros', trainable=True)
        self.kernel_f = self.add_weight(shape=kernel_shape_f_g,
                                        initializer='glorot_uniform',
                                        name='kernel_f')
        self.kernel_g = self.add_weight(shape=kernel_shape_f_g,
                                        initializer='glorot_uniform',
                                        name='kernel_g')
        self.kernel_h = self.add_weight(shape=kernel_shape_h,
                                        initializer='glorot_uniform',
                                        name='kernel_h')
        self.bias_f = self.add_weight(shape=(self.filters_f_g,),
                                      initializer='zeros',
                                      name='bias_F')
        self.bias_g = self.add_weight(shape=(self.filters_f_g,),
                                      initializer='zeros',
                                      name='bias_g')
        self.bias_h = self.add_weight(shape=(self.filters_h,),
                                      initializer='zeros',
                                      name='bias_h')
        super(Attention, self).build(input_shape)
        # Set input spec.
        self.input_spec = InputSpec(ndim=4,
                                    axes={3: input_shape[-1]})
        self.built = True


    def call(self, x):
        def hw_flatten(x):
            return K.reshape(x, shape=[K.shape(x)[0], K.shape(x)[1]*K.shape(x)[2], K.shape(x)[-1]])

        f = K.conv2d(x,
                     kernel=self.kernel_f,
                     strides=(1, 1), padding='same')  # [bs, h, w, c']
        f = K.bias_add(f, self.bias_f)
        g = K.conv2d(x,
                     kernel=self.kernel_g,
                     strides=(1, 1), padding='same')  # [bs, h, w, c']
        g = K.bias_add(g, self.bias_g)
        h = K.conv2d(x,
                     kernel=self.kernel_h,
                     strides=(1, 1), padding='same')  # [bs, h, w, c]
        h = K.bias_add(h, self.bias_h)

        s = tf.matmul(hw_flatten(g), hw_flatten(f), transpose_b=True)  # # [bs, N, N]

        beta = K.softmax(s, axis=-1)  # attention map

        o = K.batch_dot(beta, hw_flatten(h))  # [bs, N, C]

        o = K.reshape(o, shape=K.shape(x))  # [bs, h, w, C]
        x = self.gamma * o + x

        return x

    def compute_output_shape(self, input_shape):
        return input_shape
Hao Chen
  • 174
  • 1
  • 4
  • 13
  • 1
    Could you please share your full code? I tried to use your implementation, but failed, even i found a working implementation of spectral_norm and hw_flatten. – maniac Aug 23 '18 at 17:43
  • 1
    @maniac Hi, sorry for the delay. I have added my code into EDIT part. It works. – Hao Chen Sep 26 '18 at 09:16

1 Answers1

5

There are several problems with the modifications you made to the original code:

  • You cannot use numpy operations in the middle of your Keras/TF graph. First because numpy will try to operate directly, while the inputs tensors will actually be evaluated/receive their value only at graph runtime. Second because Keras/TF won't be able to back-propagate through non-Keras/TF operations.

    You should replace the original tensorflow operations by their keras or keras.backend operations instead (e.g. tf.matmul() by keras.backend.batch_dot(), tf.nn.doftmax() by keras.backend.softmax(), etc.)

  • You are mixing Keras Layers (e.g. Conv2D) and Keras operations (e.g. np/keras.backend.reshape). Keras operations should be wrapped in a Lambda layer to be used along others.

Since this custom layer has a trainable parameter (gamma), you would need to write your own custom layer, e.g.:

from keras import backend as K
from keras.engine.topology import Layer

class AttentionLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(AttentionLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer:
        self.gamma = self.add_weight(name='gamma', shape=[1], initializer='uniform', trainable=True)

        super(AttentionLayer, self).build(input_shape)

    def call(self, x):
        channels = K.int_shape(x)[-1]

        x = activation(x, channels) # use TF implementation, or reimplement with Keras operations
        return x

    def compute_output_shape(self, input_shape):
        return input_shape
benjaminplanche
  • 14,689
  • 5
  • 57
  • 69
  • Hi, thanks for your instruction. I'm just too bad at tensorflow. I updated the code implementing the attention block, hope you can have a look?. – Hao Chen Jun 12 '18 at 17:28
  • I actually have two questions here: 1. can I use Conv2d layer in the call function as what I did? 2. can I just use the tensorflow code in the call function? Will it be incorporated into keras implementation? – Hao Chen Jun 12 '18 at 17:29
  • **1.** No; as explained in my answer, you can't mix `Layers` and operations in Keras. You should use [`keras.backend.conv2d`](https://keras.io/backend/#conv2d) instead of [`keras.layers.Conv2D`](https://keras.io/layers/convolutional/#conv2d). **2.** Yes, you can use Tensorflow differentiable operations, if you have TF for backend. – benjaminplanche Jun 12 '18 at 17:48
  • is that okay to use y = self.gamma * x + x? or say we have to use the backend functions because it doesn't run the code but actually only declare – Hao Chen Jun 12 '18 at 18:34
  • Hmm yes, it should be okay. The code looks good to me. :) Is it working? – benjaminplanche Jun 12 '18 at 18:35
  • I'm testing it, will tell you when get results – Hao Chen Jun 12 '18 at 18:51
  • @AIdream I'm encountering a problem with hw_flatten : TypeError: Failed to convert object of type to Tensor. Contents: (Dimension(None), Dimension(64), Dimension(8)). Consider casting elements to a supported type. – Hao Chen Jun 12 '18 at 19:43
  • a interesting thing is that, when I try with original tf code, I get the same error – Hao Chen Jun 12 '18 at 20:09
  • Yes, I had to correct some of their code to use it too. The line you are mentioning should be edited as follows: `y = K.reshape(x, shape=(shape[0], -1, shape[-1]))` with `shape = K.shape(x)`. – benjaminplanche Jun 13 '18 at 10:12