Caffe output only returns zero values

Question

I am trying to estimate the depth of a single image. To do so I have an image and a ground_truth = depth_image and a really simple fully convolutional network. My images and depth images have both the shape 1x227x227

I have a train_val.prototxt like this:

 layer {
  name: "train-data"
  type: "Data"
  top: "data"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "mean_train.binaryproto"
  }
  data_param {
    source: ".../train_lmdb"
    batch_size: 4
    backend: LMDB
  }
}
layer {
  name: "train-depth"
  type: "Data"
  top: "depth"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "mean_train.binaryproto"
  }
  data_param {
    source: ".../train_depth_lmdb"
    batch_size: 4
    backend: LMDB
  }
}
layer {
  name: "val-data"
  type: "Data"
  top: "data"
  include {
    phase: TEST
  }
  transform_param {
    mean_file: "mean_val.binaryproto"
  }
  data_param {
    source: ".../val_lmdb"
    batch_size: 4
    backend: LMDB
  }
}
layer {
  name: "val-depth"
  type: "Data"
  top: "depth"
  include {
    phase: TEST
  }
  transform_param {
    mean_file: "mean_train.binaryproto"
  }
  data_param {
    source: ".../val_depth_lmdb"
    batch_size: 4
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 16
    kernel_size: 17
    stride: 1
    pad: 8
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  convolution_param {
    num_output: 16
    kernel_size: 15
    stride: 1
    pad: 7
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "conv2"
  top: "conv3"
  convolution_param {
    num_output: 32
    kernel_size: 11
    stride: 1
    pad: 5
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 32
    kernel_size: 9
    stride: 1
    pad: 4
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 32
    kernel_size: 9
    stride: 1
    pad: 4
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "conv6"
  type: "Convolution"
  bottom: "conv5"
  top: "conv6"
  convolution_param {
    num_output: 64
    kernel_size: 3
    stride: 1
    pad: 1
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "conv6"
  top: "conv6"
}
layer {
  name: "conv7"
  type: "Convolution"
  bottom: "conv6"
  top: "conv7"
  convolution_param {
    num_output: 1
    kernel_size: 1
    stride: 1
    pad: 0
  }
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "conv7"
  bottom: "depth"
  top: "loss"
}

Here is my log file. As you can see the loss values are way too high. I have no idea why. I changed the learning rate already, but this seems not the problem. Any ideas?

I1101 12:26:48.211413 14410 solver.cpp:404]     Test net output #0: loss = 1.75823e+08 (* 1 = 1.75823e+08 loss)
I1101 12:26:48.231112 14410 solver.cpp:228] Iteration 0, loss = 1.08477e+08
I1101 12:26:48.231155 14410 solver.cpp:244]     Train net output #0: loss = 1.08477e+08 (* 1 = 1.08477e+08 loss)
I1101 12:26:48.231180 14410 sgd_solver.cpp:106] Iteration 0, lr = 0.001
I1101 12:26:48.275223 14410 solver.cpp:228] Iteration 1, loss = 1.34519e+08
I1101 12:26:48.275249 14410 solver.cpp:244]     Train net output #0: loss = 1.34519e+08 (* 1 = 1.34519e+08 loss)
I1101 12:26:48.275254 14410 sgd_solver.cpp:106] Iteration 1, lr = 0.001
I1101 12:26:48.313233 14410 solver.cpp:228] Iteration 2, loss = 1.57773e+08
I1101 12:26:48.313277 14410 solver.cpp:244]     Train net output #0: loss = 1.57773e+08 (* 1 = 1.57773e+08 loss)
I1101 12:26:48.313282 14410 sgd_solver.cpp:106] Iteration 2, lr = 0.001
I1101 12:26:48.349695 14410 solver.cpp:228] Iteration 3, loss = 1.12463e+08
I1101 12:26:48.349742 14410 solver.cpp:244]     Train net output #0: loss = 1.12463e+08 (* 1 = 1.12463e+08 loss)
...
I1101 12:29:00.106989 14410 solver.cpp:317] Iteration 3390, loss = 1.21181e+08
I1101 12:29:00.107023 14410 solver.cpp:337] Iteration 3390, Testing net (#0)
I1101 12:29:00.107029 14410 net.cpp:693] Ignoring source layer train-data
I1101 12:29:00.107033 14410 net.cpp:693] Ignoring source layer train-depth
I1101 12:29:00.294692 14410 solver.cpp:404]     Test net output #0: loss = 1.7288e+08 (* 1 = 1.7288e+08 loss)
I1101 12:29:00.294737 14410 solver.cpp:322] Optimization Done.
I1101 12:29:00.294741 14410 caffe.cpp:254] Optimization Done.

Just for fun, I have set the label-data to the exact same data as the input data, which means it will learn the image itself but the output is still the same. There has to be something completely wrong?

your loss is in the order of 10^8. I don't think this is considered "low"... — Shai, Nov 01 '16 at 16:44
It's not converging. Try a smaller learning rate such as 0.0001/0.00001. — Dale, Nov 02 '16 at 01:03
Yeah I have tried already, does not make a difference. But even if it would converge 10^8 is too crazy, I think the network is too simple. I should go for a more complex network auto-encoder like. What do you think? @Dale — , Nov 02 '16 at 11:30
It's ok. BTW, it' better to notice the tendency of loss with refrence to iterations rather than the value of loss. — Dale, Nov 02 '16 at 11:36
What do you think about a network with convolutions - deconvolutions, which I think is more state of the art the convolutions with mantaining output size @Dale — , Nov 02 '16 at 11:40
10⁷ may be also useful. convolution-deconvolution also deserves to try. — Dale, Nov 02 '16 at 11:43
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/127175/discussion-between-thigi-and-dale). — , Nov 02 '16 at 11:44

Caffe output only returns zero values

0 Answers0