3

I want to model the following:

y(t)=F(x(t-1),x(t-2),...x(t-k))

or lets say a function that its current output is depended on the last k inputs.

1- I know one way is to have a classic Neural Network with k inputs as {x(t-1),x(t-2),...x(t-k)} for each y(t) and train it. Then what's the benefit of using a RNN to solve that problem?

2- Assuming using RNN, should i use only the x(t) (or x(t-1)) and assume the hidden layer(s) can find the relation of y(t) to the past k inputs through having the in its memory (hidden layer)?

3- using deep nets like Deep RNN or LSTM has any superior benefit for such problem considering we want to estimate the output based on the last k inputs?

Bob
  • 137
  • 7

1 Answers1

4
  1. I would not advice you to use a classic vanilla RNN. Theorethicaly it has an ability to store informations from previous inputs in its memory but practically it requires an expotentially large number of nodes.
  2. Assuming classic vanilla implementations as long as modern architectures (like e.g. LSTM or GRU) - it depends on if you want to use one directional or bidirectional model. If you want to predict next step - usually one directional architecture is better. If you want to better analyze sequences given - I advice you to apply bidirectional one.
  3. LSTMs and GRUs makes usage of additional memory cells which helps you in keeping long time dependiencies between inputs in memory. They are considered as the best architectures right now. Deep RNNs - are usually deep networks with recurrent topologies - they make use of its depth in the same manner as feedforward neural nets.
Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • Thank you for the advice. 1- i think bi-RNN doesn't fit my problem as i want to predict the probability of the next best decision based on the previous decisions (in another perspective). 2- Assuming using LSTM, I'm still not sure whether to use all the k previous instances of x(t) as separate inputs to the network or only using the last x(t-1) as the only input would be sufficient and if the network can keep track of the last k seen x(t) in its architecture? – Bob May 05 '16 at 09:11
  • Simpy use it as a metaparameter and try a grid search to find the best setup :) – Marcin Możejko May 05 '16 at 19:46
  • if my answer was useful - please accept it as correct or vote up my answer :) – Marcin Możejko May 17 '16 at 15:17