How to convolve two blobs in caffe

Question

In caffe, the convolution layer takes one bottom blob, and convolves it with learned filters (which are initialized using the weight type - "Xavier", "MSRA" etc.). However, my question is whether we can simply convolve two bottom blobs and produce a top blob. What would be the most elegant way of doing this? The purpose of this is: one of the bottom blob will be data and the other one will be a dynamic filter (changing depending on the data) produced by previous layers (I am trying to implement dynamic convolution).

My attempt:

One way which came to my mind was to modify the filler.hpp and assign a bottom blob as a filler matrix itself (instead of "Xavier", "MSRA" etc.). Then I thought the convolution layer would pick up from there. We can set lr = 0 to indicate that the weight initialized by our custom filler should not be changed. However, after I looked at the source code, I still don't know how to do it. On the other hand, I don't want to break the workflow of caffe. I still want conv layers to function normally, if I want them to.

Obviously a more tedious way is to use a combination of Slice, tile and/or Scale layer to literally implement convolution. I think it would work, but it will turn out to be messy. Any other thoughts?

Edit 1:

I wrote a new layer by modifying the convolution layer of caffe. In particular, in src/caffe/layers/conv_layer.cpp, on line 27, it takes the weight defined by the filler and convolves it with the bottom blob. So instead of populating that blob from the filler, I modified the layer such that it now takes two bottoms. One of the bottom directly gets assigned to the filler. Now I had to make some other changes such as:

weight blob has the same value for all the samples. Here it will have a different value for different samples. So I changed line 32 from:

this->forward_cpu_gemm(
    bottom_data + n * this->bottom_dim_, 
    weight, 
    top_data + n * this->top_dim_);

to:

this->forward_cpu_gemm(
    bottom_data + n * bottom[1]->count(1),
    bottom[0]->cpu_data() + n * bottom[0]->count(1), 
    top_data + n * this->top_dim_);

To make things easier, I assumed that there is no bias term involved, stride is always 1, padding can always be 0, group will always be 1 etc. However, when I tested the forward pass, it gave me some weird answer (with a simple convolution kernel = np.ones((1,1,3,3)). The learning rates were set to zero for this kernel so that it doesn't change. However, I can't get a right answer. Any suggestions will be appreciated.

Please do not propose solutions using existing layers such as Slice, Eltwise, Crop. I have already implemented - it works - but it is unbelievably complex and memory inefficient.

I read it "How to convince two blonds in caffe" :\ – Elazar Aug 09 '16 at 23:12 — Elazar, Aug 09 '16 at 23:12
@Elazar so that's why you down-voted? (just kidding) :P – Autonomous Aug 09 '16 at 23:27 — Autonomous, Aug 09 '16 at 23:27

Dale · Accepted Answer · 2016-08-23T01:24:34.933

I think you are on the right way as a whole.

For the "weird" convolution results, I guess the bug most possibly is:

Consider 2D convolution

and suppose bottom[1]'s shape is (num, channels, height, width),

since convolution in caffe is performed as a multiplication of 2 matrix, weight(representing convolution kernels) and col_buffer(reorganized from data to be convolved), and weight is of num_out rows and channels / this->group_ * kernel_h * kernel_w columns, col_buffer is of channels / this->group_ * kernel_h * kernel_w rows and height_out * width_out columns, so as a weight blob of dynamic convolution layer, bottom[0]'s shape should better be (num, num_out, channels/group, kernel_h, kernel_w) to satisfy

bottom[0]->count(1) == num_out * channels / this->group_ * kernel_h * kernel_w

, in which num_out is the number of the dynamic convolution layer's output feature maps.

That means, to make the convolution function

this->forward_cpu_gemm(bottom_data + n * bottom[1]->count(1) 
                     , bottom[0]->cpu_data() + n * bottom[0]->count(1)
                     , top_data + n * this->top_dim_);

work properly, you must make sure that

bottom[0]->shape(0) == bottom[1]->shape(0) == num
bottom[0]->count(1) == num_out * channels / this->group_ * kernel_h * kernel_w

So most possibly the simple convolution kernel of 4-dimension np.ones((1,1,3,3)) you used may not satify the above condition and result in the wrong convolution results.

Hope it's clear and will help you.

########## Update 1, Oct 10th,2016,Beijing time ##########

I add a dynamic convolution layer here but with no unit test yet. This layer doesn't break the workflow of caffe and only change some private members of BaseConvolution class to be protected.

The files involved are:

include/caffe/layers/dyn_conv_layer.hpp,base_conv_layer.hpp
src/caffe/layers/dyn_conv_layer.cpp(cu)

It grows almost the same with the convolution layer in caffe, and the differences mainly are:

Override the function LayerSetUp() to initialize this->kernel_dim_, this->weight_offset_ etc properly for convolution and ignore initializing this->blobs_ used by Convolution layer routinely to contain weight and bias;
Override the function Reshape() to check that the bottom[1] as a kernel container has proper shape for convolution.

Because I have no time to test it, there may be bugs and I will be very glad to see your feedbacks.

########## Update 2, Oct 12th,2016,Beijing time ##########

I updated test case for dynamic convolution just now. The involved file is src/caffe/test/test_dyn_convolution_layer.cpp. It seems to work fine, but maybe need more thorough tests.

You can build this caffe by cd $CAFFE_ROOT/build && ccmake .., cmake -DBUILD_only_tests="dyn_convolution_layer" .. and make runtest to check it.

I will try this out in the next 3-4 days and let you know the outcome. Thanks! — Autonomous, Aug 09 '16 at 15:22
You should pay much attention to function BaseConvolutionLayer::LayerSetUp(), because many variables used in forward_cpu_gemm() are initialized there. @ParagS.Chandakkar — Dale, Aug 09 '16 at 15:55
I suppose when you have two bottom blobs, you can ignore the convolution parameters setting kernel size and number of filters. These values should be deduced from the "weights" blob. And you might also consider third input blob for the bias term... — Shai, Aug 09 '16 at 16:55
Good point. And to deduce the kernel shape, we should explicitly set each dimension of the "weights" blob's shape. Compared with this, setting kernel size only requires that the "weights" blob(here, bottom[0])'s shape satisfies 'bottom[0]->count(1) == num_out * channels / this->group_ * kernel_h * kernel_w'. @Shai — Dale, Aug 10 '16 at 00:37
@DaleSong Can you award bounty after its expired? I was out of town and I want to award full bounty (150) to you. I have not tried the solution but I appreciate your efforts. — Autonomous, Aug 14 '16 at 02:50

How to convolve two blobs in caffe

1 Answers1