Does anyone know what computations take place inside the Caffe softmax layer?
I am using a pre-trained network with a softmax layer at the end.
In testing phase, for a simple forward of an image, the output of the second-last layer ("InnerProduct") is the following: -0.20095, 0.39989, 0.22510, -0.36796, -0.21991, 0.43291, -0.22714, -0.22229, -0.08174, 0.01931, -0.05791, 0.21699, 0.00437, -0.02350, 0.02924, -0.28733, 0.19157, -0.04191, -0.07360, 0.30252
The last layer's ("Softmax") output is the following values: 0.00000, 0.44520, 0.01115, 0.00000, 0.00000, 0.89348, 0.00000, 0.00000, 0.00002, 0.00015, 0.00003, 0.00940, 0.00011, 0.00006, 0.00018, 0.00000, 0.00550, 0.00004, 0.00002, 0.05710
If i apply a Softmax (using an external tool, like matlab) on the inner product layer's output i get the following values: 0.0398, 0.0726, 0.0610, 0.0337, 0.0391, 0.0751, 0.0388, 0.0390, 0.0449, 0.0496, 0.0460, 0.0605, 0.0489, 0.0476, 0.0501, 0.0365, 0.0590, 0.0467, 0.0452, 0.0659
The latter makes sense to me, since the probabilities add up to 1.0 (notice that the sum of Caffe's Softmax layer values is > 1.0).
Apparently, the softmax layer in Caffe is not a straight-forward Softmax operation.
(I do not think that it makes any difference, but i will just mention that i am using the pre-trained flickr style network, see description here).
EDIT:
Here is the definition of the two last layers in the proto txt. Notice that the type of the last layer is "Softmax".
layer {
name: "fc8_flickr"
type: "InnerProduct"
bottom: "fc7"
top: "fc8_flickr"
param {
lr_mult: 10
decay_mult: 1
}
param {
lr_mult: 20
decay_mult: 0
}
inner_product_param {
num_output: 20
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc8_flickr"
top: "prob"
}