3

I am trying to fine tune a pre trained model with caffe. I have 1200 training samples, and 300 development set samples. (small numbers for question simplicity). I divided the Train into 100 mini-batches each has 12 samples, and the Dev into 100 mini-batches each has 3 samples. My goal is to loop on the training and test every 1 epoch = 100 iterations. Now i want to know the difference between the following:

solver.step(100)

and

niter = 200
for it in range(niter):
  solver.step(1)

and

solver.solve()

I know that step() carries out the full 3 stages (forward prop, back prop, and update) and takes the number of iterations as an input. So, i thought step(100) in this setting means 1 epoch, and step(1) inside a loop of 200 means 2 epochs. Is that right?.

Also, when i used solver.solve() i didn't understand the Test net output #--:, why there are 11 of them?, the output:

I1128 13:19:55.134804  4229 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I1128 13:20:42.253166  4239 data_layer.cpp:73] Restarting data prefetching from start.
I1128 13:20:44.436962  4229 solver.cpp:330] Iteration 100, Testing net (#0)
I1128 13:20:44.835551  4229 solver.cpp:397]     Test net output #0: accuracy = 0.888889
I1128 13:20:44.835697  4229 solver.cpp:397]     Test net output #1: loss = 1.40894 (* 1 = 1.40894 loss)
I1128 13:20:44.835763  4229 solver.cpp:397]     Test net output #2: prob = 0.333332
I1128 13:20:44.835824  4229 solver.cpp:397]     Test net output #3: prob = 1.03709e-06
I1128 13:20:44.835886  4229 solver.cpp:397]     Test net output #4: prob = 0.666667
I1128 13:20:44.835945  4229 solver.cpp:397]     Test net output #5: prob = 0.333333
I1128 13:20:44.836004  4229 solver.cpp:397]     Test net output #6: prob = 0.333333
I1128 13:20:44.836062  4229 solver.cpp:397]     Test net output #7: prob = 0.333333
I1128 13:20:44.836119  4229 solver.cpp:397]     Test net output #8: prob = 0.333333
I1128 13:20:44.836179  4229 solver.cpp:397]     Test net output #9: prob = 4.97935e-14
I1128 13:20:44.836236  4229 solver.cpp:397]     Test net output #10: prob = 0.666667
I1128 13:21:38.373956  4239 data_layer.cpp:73] Restarting data prefetching from start.
I1128 13:21:40.397017  4229 solver.cpp:447] Snapshotting to binary proto file _iter_200.caffemodel
I1128 13:21:40.884833  4229 sgd_solver.cpp:273] Snapshotting solver state to binary proto file _iter_200.solverstate
I1128 13:21:41.127754  4229 solver.cpp:330] Iteration 200, Testing net (#0)
I1128 13:21:41.419747  4229 solver.cpp:397]     Test net output #0: accuracy = 0.444444
I1128 13:21:41.419805  4229 solver.cpp:397]     Test net output #1: loss = 12.7511 (* 1 = 12.7511 loss)
I1128 13:21:41.419816  4229 solver.cpp:397]     Test net output #2: prob = 0.126513
I1128 13:21:41.419824  4229 solver.cpp:397]     Test net output #3: prob = 0.873487
I1128 13:21:41.419834  4229 solver.cpp:397]     Test net output #4: prob = 1.36409e-10
I1128 13:21:41.419843  4229 solver.cpp:397]     Test net output #5: prob = 5.67621e-21
I1128 13:21:41.419852  4229 solver.cpp:397]     Test net output #6: prob = 0.667183
I1128 13:21:41.419862  4229 solver.cpp:397]     Test net output #7: prob = 0.332817
I1128 13:21:41.419870  4229 solver.cpp:397]     Test net output #8: prob = 4.48244e-05
I1128 13:21:41.419880  4229 solver.cpp:397]     Test net output #9: prob = 0.666622
I1128 13:21:41.419908  4229 solver.cpp:397]     Test net output #10: prob = 0.333333
I1128 13:21:41.419916  4229 solver.cpp:315] Optimization Done.

The output of solver.step(200) :

I1128 13:47:02.000474  5385 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I1128 13:47:48.170166  5397 data_layer.cpp:73] Restarting data prefetching from start.
I1128 13:47:50.009802  5385 solver.cpp:330] Iteration 100, Testing net (#0)
I1128 13:47:50.403555  5385 solver.cpp:397]     Test net output #0: accuracy = 1
I1128 13:47:50.403700  5385 solver.cpp:397]     Test net output #1: loss = 0.0764709 (* 1 = 0.0764709 loss)
I1128 13:47:50.403764  5385 solver.cpp:397]     Test net output #2: prob = 4.34344e-09
I1128 13:47:50.403823  5385 solver.cpp:397]     Test net output #3: prob = 0.333333
I1128 13:47:50.403883  5385 solver.cpp:397]     Test net output #4: prob = 0.666667
I1128 13:47:50.403942  5385 solver.cpp:397]     Test net output #5: prob = 0.306925
I1128 13:47:50.404002  5385 solver.cpp:397]     Test net output #6: prob = 0.359741
I1128 13:47:50.404062  5385 solver.cpp:397]     Test net output #7: prob = 0.333333
I1128 13:47:50.404121  5385 solver.cpp:397]     Test net output #8: prob = 0.181897
I1128 13:47:50.404181  5385 solver.cpp:397]     Test net output #9: prob = 0.151436
I1128 13:47:50.404240  5385 solver.cpp:397]     Test net output #10: prob = 0.666667
I1128 13:48:39.077320  5397 data_layer.cpp:73] Restarting data prefetching from start.

Solver File:

test_iter: 3
test_interval: 100
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
#momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 10000
# The maximum number of iterations
max_iter: 200

----- EXPERIMENT -----

I tried solver.step(201) instead of 200 and the output was similar to solver.solve():

I1128 13:55:08.297905  5757 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I1128 13:55:56.028899  5769 data_layer.cpp:73] Restarting data prefetching from start.
I1128 13:55:58.175401  5757 solver.cpp:330] Iteration 100, Testing net (#0)
I1128 13:55:58.536119  5757 solver.cpp:397]     Test net output #0: accuracy = 1
I1128 13:55:58.536265  5757 solver.cpp:397]     Test net output #1: loss = 5.43066e-07 (* 1 = 5.43066e-07 loss)
I1128 13:55:58.536326  5757 solver.cpp:397]     Test net output #2: prob = 7.04335e-10
I1128 13:55:58.536393  5757 solver.cpp:397]     Test net output #3: prob = 0.333333
I1128 13:55:58.536453  5757 solver.cpp:397]     Test net output #4: prob = 0.666667
I1128 13:55:58.536512  5757 solver.cpp:397]     Test net output #5: prob = 0.333333
I1128 13:55:58.536571  5757 solver.cpp:397]     Test net output #6: prob = 0.333333
I1128 13:55:58.536628  5757 solver.cpp:397]     Test net output #7: prob = 0.333333
I1128 13:55:58.536685  5757 solver.cpp:397]     Test net output #8: prob = 0.333332
I1128 13:55:58.536743  5757 solver.cpp:397]     Test net output #9: prob = 1.64471e-06
I1128 13:55:58.536801  5757 solver.cpp:397]     Test net output #10: prob = 0.666667
I1128 13:56:50.299724  5769 data_layer.cpp:73] Restarting data prefetching from start.
I1128 13:56:52.169708  5757 solver.cpp:330] Iteration 200, Testing net (#0)
I1128 13:56:52.469816  5757 solver.cpp:397]     Test net output #0: accuracy = 0.555556
I1128 13:56:52.469964  5757 solver.cpp:397]     Test net output #1: loss = 8.99609 (* 1 = 8.99609 loss)
I1128 13:56:52.470028  5757 solver.cpp:397]     Test net output #2: prob = 0.333333
I1128 13:56:52.470088  5757 solver.cpp:397]     Test net output #3: prob = 0.666667
I1128 13:56:52.470146  5757 solver.cpp:397]     Test net output #4: prob = 1.07012e-10
I1128 13:56:52.470206  5757 solver.cpp:397]     Test net output #5: prob = 1.24848e-15
I1128 13:56:52.470264  5757 solver.cpp:397]     Test net output #6: prob = 0.666667
I1128 13:56:52.470322  5757 solver.cpp:397]     Test net output #7: prob = 0.333333
I1128 13:56:52.470381  5757 solver.cpp:397]     Test net output #8: prob = 7.49798e-06
I1128 13:56:52.470438  5757 solver.cpp:397]     Test net output #9: prob = 0.6666
I1128 13:56:52.470496  5757 solver.cpp:397]     Test net output #10: prob = 0.333392

The same for

niter = 201
for it in range(niter):
   solver.step(1)

the output is:

I1128 14:00:35.986286  6020 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I1128 14:01:26.579378  6030 data_layer.cpp:73] Restarting data prefetching from start.
I1128 14:01:28.678328  6020 solver.cpp:330] Iteration 100, Testing net (#0)
I1128 14:01:28.977371  6020 solver.cpp:397]     Test net output #0: accuracy = 0.888889
I1128 14:01:28.977429  6020 solver.cpp:397]     Test net output #1: loss = 0.953584 (* 1 = 0.953584 loss)
I1128 14:01:28.977444  6020 solver.cpp:397]     Test net output #2: prob = 0.333271
I1128 14:01:28.977458  6020 solver.cpp:397]     Test net output #3: prob = 6.24673e-05
I1128 14:01:28.977475  6020 solver.cpp:397]     Test net output #4: prob = 0.666667
I1128 14:01:28.977533  6020 solver.cpp:397]     Test net output #5: prob = 0.333333
I1128 14:01:28.977589  6020 solver.cpp:397]     Test net output #6: prob = 0.333333
I1128 14:01:28.977644  6020 solver.cpp:397]     Test net output #7: prob = 0.333333
I1128 14:01:28.977699  6020 solver.cpp:397]     Test net output #8: prob = 0.333333
I1128 14:01:28.977752  6020 solver.cpp:397]     Test net output #9: prob = 4.3448e-11
I1128 14:01:28.977813  6020 solver.cpp:397]     Test net output #10: prob = 0.666667
I1128 14:02:20.853430  6030 data_layer.cpp:73] Restarting data prefetching from start.
I1128 14:02:22.835402  6020 solver.cpp:330] Iteration 200, Testing net (#0)
I1128 14:02:23.163835  6020 solver.cpp:397]     Test net output #0: accuracy = 0.888889
I1128 14:02:23.163980  6020 solver.cpp:397]     Test net output #1: loss = 0.154905 (* 1 = 0.154905 loss)
I1128 14:02:23.164043  6020 solver.cpp:397]     Test net output #2: prob = 0.666667
I1128 14:02:23.164103  6020 solver.cpp:397]     Test net output #3: prob = 0.333333
I1128 14:02:23.164161  6020 solver.cpp:397]     Test net output #4: prob = 1.96634e-17
I1128 14:02:23.164222  6020 solver.cpp:397]     Test net output #5: prob = 0.0826814
I1128 14:02:23.164280  6020 solver.cpp:397]     Test net output #6: prob = 0.583985
I1128 14:02:23.164340  6020 solver.cpp:397]     Test net output #7: prob = 0.333333
I1128 14:02:23.164405  6020 solver.cpp:397]     Test net output #8: prob = 0.666667
I1128 14:02:23.164464  6020 solver.cpp:397]     Test net output #9: prob = 1.03834e-10
I1128 14:02:23.164525  6020 solver.cpp:397]     Test net output #10: prob = 0.333333

Can we assume that the three are similar?, if yes, then when to use each one of them?.

Fadwa
  • 1,717
  • 5
  • 26
  • 43
  • 1
    regarding the output "prob": you have a `"top"` with the name `"prob"` (that has dim=9) and no other layer takes this `"prob"` as an input. Therefore, caffe outputs its values to log. You can use [`"Silence"`](http://caffe.help/manual/layers/silence.html) layer to suppress this output. – Shai Nov 28 '17 at 13:53

0 Answers0