0

I am giving a short presentation on neural networks Tuesday to my fellow student web developers. I was hoping to translate this code (under Part 1, a tiny toy neural network: 2 layer network) to JavaScript so it would be more recognizable to my audience.

import numpy as np

# sigmoid function
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# input dataset
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])

# output dataset            
y = np.array([[0,0,1,1]]).T

# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

for iter in xrange(10000):

    # forward propagation
    l0 = X
    l1 = nonlin(np.dot(l0,syn0))

    # how much did we miss?
    l1_error = y - l1

    # multiply how much we missed by the 
    # slope of the sigmoid at the values in l1
    l1_delta = l1_error * nonlin(l1,True)

    # update weights
    syn0 += np.dot(l0.T,l1_delta)

print "Output After Training:"
print l1

Here's my JavaScript code as it stands now. I just de-ES6ified it to get it to run in my IDE:

const _ = require('lodash')
const m = require('mathjs')

const sigmoid = function(z) { return 1.0 / (1.0 + Math.exp(-z)) }

const sigmoid_prime = function(z) { return sigmoid(z) * (1 - sigmoid(z)) }

var X = m.matrix([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
var y = m.transpose(m.matrix(([[0,1,1,0]])))

var syn0 = m.random([3, 1], -1, 1)

var l0, l1, l1_delta, l1_error

_.range(10000).forEach(function() {

    l0 = X;
    l1 = m.map(m.multiply(l0, syn0), sigmoid)
    l1_error = m.subtract(y, l1)
    l1_delta = m.dotMultiply(l1_error, m.map(l1, sigmoid_prime))
    syn0 = m.multiply(m.transpose(l0),l1_delta)
})

console.log("Output After Training:")
console.log(l1)

As you can see I'm using mathjs as a substitute for numpy. I have tried to look carefully at the documentation for mathjs and numpy and not mix up my matrix multiplication and my elementwise multiplication, but something is very broken and I get .5 for every output. I have stepped through my program in the debugger and compared values side by side in a python scratch file, starting python out with the values for syn0 that the JavaScript program generated, and it seems like it's here, the backpropagation line, that they slightly diverge (and maybe diverge more over iterations): l1_delta = m.dotMultiply(l1_error, m.map(l1, sigmoid_prime)). But I can't see why.

EDIT: I should have updated my code before I posted to reflect that in the last version I changed the y definition to var y = m.matrix([ [0], [0], [1], [1]]) and it slightly modified the problem, in that the output switched from being all .5's to floats slightly off of .5.

SECOND EDIT: Brent rightly points out in comments that I have a bug, in that to imitate the code I'm porting from my sigmoid prime function only needs to be z*(1-z). I had missed that wrinkle. Sadly, this doesn't make a difference. Console logging the stringified function and the value of syn0 in the last iteration:

sigmoid prime is function (z) {return sigmoid(z) * (1 - sigmoid(z))}
syn0 is Matrix {
  _data: 
   [ [ 0.21089543115482337 ],
     [ -0.010100491415226356 ],
     [ -0.021376195229226028 ] ],
  _size: [ 3, 1 ],
  _datatype: undefined }

Now changing the function:

sigmoid prime is function (z) { return z * (1 - (z)) }
syn0 is Matrix {
  _data: 
   [ [ 0.2235282818415481 ],
     [ -0.010714305064562765 ],
     [ -0.022890185954402634 ] ],
  _size: [ 3, 1 ],
  _datatype: undefined }
Katie
  • 808
  • 1
  • 11
  • 28
  • I'm going to throw in a guess here - are your weights correctly typed to floats/doubles and not ints etc? – Monza Jun 18 '17 at 23:33
  • If I add this line `m.map(syn0, function(w) {console.log(typeof w)})` right after `syn0 = m.multiply(m.transpose(l0),l1_delta)` it console logs 'number'. I don't think they can be ints, they have lots of decimal places (unless I'm unwittingly saying something simpleminded about data types, I don't know). – Katie Jun 18 '17 at 23:45

1 Answers1

0

It looks like you're very close, this is a nice port.

I think this is a small bug in your translation of the nonlin function. In the case where the deriv parameter is true, the equation is x * (1 - x). In your version you are using sigmoid(x) * (1 - sigmoid(x)). I don't think you need to be calling sigmoid from within sigmoid_prime.

I hope that helps!

  • You're right, and thanks for pointing it out, but it actually doesn't seem to making a difference. I don't have a mathematical intuition for why that is, but it seems to be the case. I'll put some console logs showing this in an edit to the question; it's a little too involved for a comment. I should also note that "the weights converge to zero" in the original question was wrong, and maybe thinking through that being wrong will help me figure out what's going on. The last two converge to zero, but the first is greater than the last two, as it should be, but is still too small. – Katie Jun 19 '17 at 11:26
  • (I edited out "the weights converge to zero" I decided the least confusing thing was to get rid of it entirely in the original question but I'm sorry if it was misleading!) – Katie Jun 19 '17 at 11:47