I am using conv2 function in armadillo with image size of 224x224 and mask size of 10x10. For a 3 channel image, I am doing something like:
arma::mat temp(215, 215, fill::zeros);
for (int i = 0; i < 3; i++)
temp += arma::mat(arma::conv2(image_channel, channel_mask)).submat(9, 9, 222, 222);
I want only valid convolution and hence I am using submat. This code is executed in a loop 32 times with different masks. For 32 iterations it takes 2.37 seconds which is way much slower than octave. Octave can execute the same code in 0.25 seconds.
Both octave and armadillo are set up to use OpenBLAS and I have defined appropriate flags in c++ file. (Eg. ARMA_USE_BLAS
etc.). Can anybody please tell me what is the problem here.