3

Recently, I've found some papers about generative recurrent models. All have attached sub-networks like prior/encoder/decoder/etc. to well-known LSTM cell for composing an aggregation of new-type RNN cell.

I am just curious about whether the gradient vanishing/exploding happens or not to those new RNN cell. Isn't there any problem about that kind of combination?

References:

It seems like they all have similar pattern as mentioned above.

A Recurrent Latent Variable Model for Sequential Data

Learning Stochastic Recurrent Networks

Z-Forcing: Training Stochastic Recurrent Networks

Pseudocode

The pseudocode for recurrent architecture is below:

def new_rnncell_call(x, htm1):
    #prior_net/posterior_net/decoder_net is single layer or mlp each
    q_prior = prior_net(htm1) # prior step
    q = posterior_net([htm1, x]) # inference step
    z = sample_from(q) # reparameterization trick
    target_dist = decoder_net(z) # generation step
    ht = innerLSTM([z, x], htm1) # recurrent step
    return [q_prior, q, target_dist], ht

What concerns me are those naked weights outside of well-known LSTM (or GRU etc.) cell during processing bptt without any gating logic for activations as the weights inside LSTM. For me, this looks not similar to stacked-rnn layers or additional dense layers just to outputs.

Doesn't that have any gradient vanishing/exploding problem?

Community
  • 1
  • 1
Sehee Park
  • 31
  • 3
  • 1
    This is a great question, but I think it might fit better somewhere else, since it's really about statistics/machine learning, rather than a particular coding implementation...? – gmds Apr 03 '19 at 03:33
  • @gmds Thanks! As I am totally a newbie in stackoverflow, which category fits best for this question? Or you mean out-world of this site? – Sehee Park Apr 03 '19 at 03:56
  • I think you might want to check out [Data Science](https://datascience.stackexchange.com/), which is a related site in the Stack Exchange network. All the best! – gmds Apr 03 '19 at 03:57

0 Answers0