The reason why this happened is, because the input data was too large. The activation sigmoid function converged to f(x)=1 for x -> inf. I had to normalize the data
e.g.:
a = np.array([1,2,3,4,5])
a /= a.max()
or prevent generating unnormalized data at all.
Also, the interims value was updated BEFORE the sigmoid was applied. But the derivation of sigmoid looks like this: y'(x) = y(x)-(1-y(x)). In my case it was just: y'(x) = x-(1-x)
There were also errors in how i updated the weights after calculating the deltas. I rewrote the whole loop using a tutorial for neural networks with python and then it worked.
It still does not support bias but it can do classification. For regression it's not precise enough, but i guess this has to do with the missing bias.
Here is the code:
http://pastebin.com/hRCKe1dK
Someone suggested that i should put my training-data into a neural-network framework and see if it works. It didn't. So it was kindof clear that it had to to with it and so i had to the idea that it should be between -1 and 1.