Attention mechanism for sequence classification (seq2seq tensorflow r1.1)

Question

I am trying to build a bidirectional RNN with attention mechanism for sequence classification. I am having some issues understanding the helper function. I have seen that the one used for training needs the decoder inputs, but as I want a single label from the whole sequence, I don't know exactly what input should I give here. This is the structure that I have built so far:

# Encoder LSTM cells
lstm_fw_cell = rnn.BasicLSTMCell(n_hidden)
lstm_bw_cell = rnn.BasicLSTMCell(n_hidden)

# Bidirectional RNN
outputs, states = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell,
                  lstm_bw_cell, inputs=x, 
                  sequence_length=seq_len, dtype=tf.float32)

# Concatenate forward and backward outputs
encoder_outputs = tf.concat(outputs,2)

# Decoder LSTM cell
decoder_cell = rnn.BasicLSTMCell(n_hidden)

# Attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(n_hidden, encoder_outputs)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(decoder_cell, 
            attention_mechanism, attention_size=n_hidden)
            name="attention_init")

# Initial attention
attn_zero = attn_cell.zero_state(batch_size=tf.shape(x)[0], dtype=tf.float32)
init_state = attn_zero.clone(cell_state=states[0])

# Helper function
helper = tf.contrib.seq2seq.TrainingHelper(inputs = ???)

# Decoding
my_decoder = tf.contrib.seq2seq.BasicDecoder(cell=attn_cell,
             helper=helper,
             initial_state=init_state)

decoder_outputs, decoder_states = tf.contrib.seq2seq.dynamic_decode(my_decoder)

My input is a sequence [batch_size,sequence_length,n_features] and my output is a single vector with N possible classes [batch_size,n_classes].

Do you know what am I missing here or if it is possible to use seq2seq for sequence classification?

score 3 · Accepted Answer · answered Apr 28 '17 at 12:56

A Seq2Seq model is by definition not suitable for a task like this. As the name implies, it converts a sequence of inputs (the words in a sentence) to a sequence of labels (the parts of speech of the words). In your case, you are looking for a single label per sample, not a sequence of them.

Fortunately, you have all you need for this already, as you only need the outputs or states of the encoder (the RNN).

The simplest way to create a classifier using this is to use the final state of the RNN. Add a fully connected layer on top of this with shape [n_hidden, n_classes]. On this you can train a softmax layer and loss which predicts the final category.

In principle, this does not include an attention mechanism. However, if you want to include one, it can be done by weighing each of the outputs of the RNN by a learned vector and then taking the sum. However, this is not guaranteed to improve the results. For further reference, https://arxiv.org/pdf/1606.02601.pdf implements this type of attention mechanism if I'm not mistaken.

I don't agree that seq2seq is not suitable for classification. Here, it is used for classification task: https://andriymulyar.com/blog/bert-document-classification — artona, Oct 14 '20 at 07:50

Attention mechanism for sequence classification (seq2seq tensorflow r1.1)

1 Answers1