I'm trying to figure out the code from the second part of this article (Q-learning + NN) https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0
1) Why do we start learning the network? Is it not easier to write targetQ[0,a[0]] in a matrix of weights? 2) Why after a network training W[s,[a0]] != targetQ[0,a[0]]? and as a consequence loss != 0