Attention has been used with encoder decoder to my knowledge. I am trying to use it as a layer in a feedforward neural network. I have the following archeticturE:
Input layer -> Dense Layer -> Self-Attention Layer -> Dense Layer -> SoftMax Layer
Code (only attention):
import tensorflow as tf
import tensorflow.keras.layers
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, MultiHeadAttention
import keras
layer = MultiHeadAttention(num_heads=2, key_dim=2)
target = tf.keras.Input(shape=[8, 16])
source = tf.keras.Input(shape=[4, 16])
output_tensor, weights = layer(target, source,return_attention_scores=True)
The input to the attention layer is the output of the first dense layer I think. Then what I should do with the output of the attention layer please? I mean output_tensor, weights