1

I am using pycaffe to do a multilabel classification task. When I run solver.slove() or solver.step(2), only one iteration is executed, then the current process is killed somehow. ipython console says the kernel died unexpectedly. No other error information is provided. Then, I use terminal to run the command "python Test.py", and get the "Floating point exception (core dumped)" information.

enter image description here

Besides, the net.forward() and net.backward() methods are all ok. What is the reason? And how to solve the problem?

enter image description here

Luka Kerr
  • 4,161
  • 7
  • 39
  • 50
  • Are you running on CPU or GPU? – Shai Oct 18 '16 at 11:23
  • did you try to train from the command line ? `caffe train --solver yoursolver.prototxt`. If you manage to run from the commandline, then re-build caffe with debug information, and run the training command from gdb. This will probably lead to some valuable information on where the crash is happening exactly. From there it's much easier to help. – MohamedEzz Oct 18 '16 at 14:40
  • @Shai I use the GPU mode. And I use the python data layer to load images, so I cannot train from the command line. I suspect that it is the parameters update causes the floating point exception. Because I can create a net, load data, deprocess the data in blobs and show them. Moreover, both the net.forward() and backward() methods can be executed, and the print information can also be executed until ApplyUpdate() method in the solver's source code. – Robinson David Oct 19 '16 at 13:29
  • @RobinsonDavid try running in CPU mode. – Shai Oct 19 '16 at 13:40
  • @Shai I run the program in a gpu node. I saw the source code of caffe's solver class again, and found that the codes before this below code segment are executed. for (int i = 0; i < callbacks_.size(); ++i) { callbacks_[i]->on_gradients_ready(); } ApplyUpdate(); I don't know whether the for loop has some problem, but in the ApplyUpdate() method, I found that CHECK(Caffe::root_solver()); Dtype rate = GetLearningRate(); if (this->param_.display() && this->iter_ % this->param_.display() == 0) { LOG(INFO) << "Iteration " << this->iter_ << ", lr = " << rate; } – Robinson David Oct 19 '16 at 14:28
  • @Shai The print information is not executed. The Floating point exception may occur in the previous for loop or in the beginning of ApplyUpdate() method. – Robinson David Oct 19 '16 at 14:28
  • 1
    @Shai I find the ApplyUpdate() calls the GetLearningRate() method, and this GetLearningRate() method has such a code: this->iter_ / this->param_.stepsize() And I find I don't specify the stepsize in the solver.prototxt. Maybe there is / 0 exception. I will try soon. Thanks a lot for discussing with me. – Robinson David Oct 19 '16 at 14:52
  • @RobinsonDavid you found the answer yourself. Can you please post it as an answer here, so future user who stumble upon a similar question will not have to dig through all the comments here... – Shai Oct 26 '16 at 05:04

0 Answers0