I wouldn't say it does automatic "unfolding" - rather, Theano has a notion of what variables are connected, and can pass updates along that chain. If this is what you mean by unfolding, then maybe we are talking about the same thing.
I am stepping through this as well, but using Rasvan Pascanu's rnn.py code (from this thread) for reference. It seems much more straightforward for a learning example.
You might gain some value from visualizing/drawing graphs from the tutorial. There is also set of slides online with a simple drawing that shows the diagram from a 1 layer "unfolding" of an RNN, which you discuss in your post.
Specifically, look at the step
function:
def step(u_t, h_tm1, W, W_in, W_out):
h_t = TT.tanh(TT.dot(u_t, W_in) + TT.dot(h_tm1, W))
y_t = TT.dot(h_t, W_out)
return h_t, y_t
This function represents the "simple recurrent net" shown in these slides, pg 10. When you do updates, you simply pass the gradient w.r.t. W, W_in, and W_out, respectively (remember that y is connected to those three via the step
function! This is how the gradient magic works).
If you had multiple W layers (or indexes into one big W, as I believe gwtaylor is doing), then that would create multiple layers of "unfolding". From what I understand, this network only looks 1 step backward in time. If it helps, theanonets also has an RNN implementation in Theano.
As an additional note, training RNNs with BPTT is hard. Ilya Sutskever's dissertation discusses this at great length - if you can, try to tie into a Hessian Free optimizer, there is also a reference RNN implementation here. Theanets also does this, and may be a good reference.