In caffe, the convolution
layer takes one bottom blob, and convolves it with learned filters (which are initialized using the weight type - "Xavier", "MSRA" etc.). However, my question is whether we can simply convolve two bottom blobs and produce a top blob. What would be the most elegant way of doing this? The purpose of this is: one of the bottom blob will be data
and the other one will be a dynamic filter (changing depending on the data
) produced by previous layers (I am trying to implement dynamic convolution).
My attempt:
One way which came to my mind was to modify the filler.hpp
and assign a bottom blob as a filler
matrix itself (instead of "Xavier", "MSRA" etc.). Then I thought the convolution layer would pick up from there. We can set lr = 0
to indicate that the weight initialized by our custom filler should not be changed. However, after I looked at the source code, I still don't know how to do it. On the other hand, I don't want to break the workflow of caffe. I still want conv layers to function normally, if I want them to.
Obviously a more tedious way is to use a combination of Slice
, tile
and/or Scale
layer to literally implement convolution. I think it would work, but it will turn out to be messy. Any other thoughts?
Edit 1:
I wrote a new layer by modifying the convolution layer of caffe. In particular, in src/caffe/layers/conv_layer.cpp
, on line 27, it takes the weight defined by the filler
and convolves it with the bottom blob. So instead of populating that blob from the filler
, I modified the layer such that it now takes two bottoms. One of the bottom directly gets assigned to the filler. Now I had to make some other changes such as:
weight
blob has the same value for all the samples. Here it will have a different value for different samples. So I changed line 32 from:
this->forward_cpu_gemm(
bottom_data + n * this->bottom_dim_,
weight,
top_data + n * this->top_dim_);
to:
this->forward_cpu_gemm(
bottom_data + n * bottom[1]->count(1),
bottom[0]->cpu_data() + n * bottom[0]->count(1),
top_data + n * this->top_dim_);
To make things easier, I assumed that there is no bias term involved, stride is always 1, padding can always be 0, group will always be 1 etc. However, when I tested the forward pass, it gave me some weird answer (with a simple convolution kernel = np.ones((1,1,3,3))
. The learning rates were set to zero for this kernel so that it doesn't change. However, I can't get a right answer. Any suggestions will be appreciated.
Please do not propose solutions using existing layers such as Slice, Eltwise, Crop
. I have already implemented - it works - but it is unbelievably complex and memory inefficient.