In TensorFlow or Theano, you only tell the library how your neural network is, and how feed-forward should operate.
For instance, in TensorFlow, you would write:
with graph.as_default():
_X = tf.constant(X)
_y = tf.constant(y)
hidden = 20
w0 = tf.Variable(tf.truncated_normal([X.shape[1], hidden]))
b0 = tf.Variable(tf.truncated_normal([hidden]))
h = tf.nn.softmax(tf.matmul(_X, w0) + b0)
w1 = tf.Variable(tf.truncated_normal([hidden, 1]))
b1 = tf.Variable(tf.truncated_normal([1]))
yp = tf.nn.softmax(tf.matmul(h, w1) + b1)
loss = tf.reduce_mean(0.5*tf.square(yp - _y))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
I am using L2-norm loss function, C=0.5*sum((y-yp)^2), and in the backpropagation step presumably the derivative will have to be computed, dC=sum(y-yp). See (30) in this book.
My question is: how can TensorFlow (or Theano) know the analytical derivative for backpropagation? Or do they do an approximation? Or somehow do not use the derivative?
I have done the deep learning udacity course on TensorFlow, but I am still at odds at how to make sense on how these libraries work.