2

I am currently using a generative RNN to classify indices in a sequence (sort of saying whether something is noise or not noise).

My input in continuous (i.e. a real value between 0 and 1) and my output is either a (0 or 1).

For example, if the model marks a 1 for numbers greater than 0.5 and 0 otherwise,

[.21, .35, .78, .56, ..., .21] => [0, 0, 1, 1, ..., 0]:

   0     0     1     1          0
   ^     ^     ^     ^          ^
   |     |     |     |          |
o->L1  ->L2  ->L3  ->L4 ->... ->L10
   ^     ^     ^     ^          ^
   |     |     |     |          |
   .21  .35   .78   .56   ...  .21

Using

n_steps = 10
n_inputs = 1
n_neurons = 7
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu)
rnn_outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

rnn_outputs becomes a (?, 10, 7) shape tensor, presumable 7 outputs per each of the 10 time steps.

Previously, I have run the following snippet on output projection wrapped rnn_outputs to get a classification label per sequence.

xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,logits=logits)

loss = tf.reduce_mean(xentropy)

How would I run something similar on rnn_outputs to get a sequence?

Specifically,

1. Can I get the rnn_output from each step and feed it into a softmax?

curr_state = rnn_outputs[:,i,:]
logits = tf.layers.dense(states, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)

2. What loss function should I use and should it be applied across every value of every sequence? (for sequence i and step j, loss = y_{ij} (true) - y_{ij}(predicted) )?

Should my loss be loss = tf.reduce_mean(np.sum(xentropy))?

EDIT It seems I am trying to implement something similar to what is similar in https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/ in TensorFlow.

In Keras, there's a TimeDistributed function:

You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps, independently

How would I go about implementing something similar in Tensorflow?

Maxim
  • 52,561
  • 27
  • 155
  • 209
d.mc2
  • 1,129
  • 3
  • 16
  • 31

1 Answers1

1

First up, it looks like you're doing seq-to-seq modelling. In this kind of problems it's usually a good idea to go with encoder-decoder architecture rather than predict the sequence from the same RNN. Tensorflow has a big tutorial about it under the name "Neural Machine Translation (seq2seq) Tutorial", which I'd recommend you to check out.

However, the architecture that you're asking about is also possible provided that n_steps is known statically (despite using dynamic_rnn). In this case, it's possible compute the cross-entropy of each cells' output and then sum up all the losses. It's possible if the RNN length is dynamic as well, but would be more hairy. Here's the code:

n_steps = 2
n_inputs = 3
n_neurons = 5

X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs], name='x')
y = tf.placeholder(dtype=tf.int32, shape=[None, n_steps], name='y')
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)

# Reshape to make `time` a 0-axis
time_based_outputs = tf.transpose(outputs, [1, 0, 2])
time_based_labels = tf.transpose(y, [1, 0])
losses = []
for i in range(n_steps):
  cell_output = time_based_outputs[i]   # get the output, can do apply further dense layers if needed
  labels = time_based_labels[i]         # get the label (sparse)
  loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=cell_output)
  losses.append(loss)                   # collect all losses
total_loss = tf.reduce_sum(losses)      # compute the total loss
Maxim
  • 52,561
  • 27
  • 155
  • 209