0

I try to create two networks with same weights and biases and I expect similar learning curve. At iteration 2 all blobs in two networks are same( data & Diff) but params(weights & biases) are different!

What I'm doing wrong here ?

note: in network there is not any shuffling for dataset and dropout layer.

Thanks

solver1 = caffe.SGDSolver('lenet_solver.prototxt')
solver2 = caffe.SGDSolver('lenet_solver.prototxt')
solver1.step(1)
solver2.step(1)
CopySolver(solver1,solver2)
for i in range(10):
   solver1.step(1)    
   solver2.step(1)
   print solver1.net.params['ip2'][1].diff
   print solver2.net.params['ip2'][1].diff

def CopySolver(SolverA,SolverB):
   params = SolverA.net.params.keys()
   paramsA = {pr: (SolverA.net.params[pr][0].data,SolverA.net.params[pr][1].data) for pr in params}    
   paramsB = {pr: (SolverB.net.params[pr][0].data,SolverB.net.params[pr][1].data) for pr in params}                  
   for pr in params:
       paramsB[pr][1][...] = paramsA [pr][1]  #bias
       paramsB[pr][0][...] = paramsA [pr][0]  #weights 
E. Nas
  • 1

2 Answers2

1

You did not take the momentum of the solver into account. After copying the net parameters from one solver object to the other, the momentum information of the solver (like SGD) are still different between solver1 and solver2. If you set "momentum: 0" in your "lenet_solver.prototxt" you should get the expected behaviour.

Otherwise you could also save the parameters, create two new solver objects, load the parameters and restart the training. Doing this, you ensure that both are starting with no initial momentum. Here an example how this could look like:

solver1 = caffe.SGDSolver('lenet_solver.prototxt')
solver2 = caffe.SGDSolver('lenet_solver.prototxt')
solver1.step(1)
solver2.step(1)

solver1.net.save("tmp.caffemodel")

solver1 = caffe.SGDSolver('lenet_solver.prototxt')
solver2 = caffe.SGDSolver('lenet_solver.prototxt')
solver1.net.copy_from("tmp.caffemodel")
solver2.net.copy_from("tmp.caffemodel")

for i in range(10):
   solver1.step(1)   
   solver2.step(1)
   print solver1.net.params['ip2'][1].diff
   print solver2.net.params['ip2'][1].diff
Erik B.
  • 236
  • 1
  • 4
0

you are doing good. every train is individual. you can have 2 same nets and train them on a same data set, but each net will start from random place, so thats why you got a different params for each net.

Z.Kal
  • 426
  • 4
  • 18
  • In this code after iteration one I copy network1 to network2( all biases & all weight) then the initial point is same. I expect in the learning process there is not any difference between both network's parameters. – E. Nas Oct 20 '16 at 13:56