I'm trying to figure out how to backpropagate a GRU Recurrent network, but I'm having trouble understanding the GRU architecture precisely.
The image below shows a GRU cell with 3 neural networks, receiving the concatenated previous hidden state and the input vector as its input.
This image used I referenced for backpropagation, however, shows the inputs being forwarded into W and U for each of the gates, added, and then having their appropriate activation functions applied.
the equation for the update gate shown on wikipedia is as shown here as an example
zt = sigmoid((W(z)xt + U(z)ht-1))
can somebody explain to me what W and U represent?
EDIT:
in most of the sources I found, W and U are usually referred to as "weights", so my best guess is that W and U represent their own neural networks, but this would contradict the image I found before.
if somebody could give an example of how W and U would work in a simple GRU, that would be helpful.
Sources for the images: https://cran.r-project.org/web/packages/rnn/vignettes/GRU_units.html https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45