0

makemake I benchmarked the default function for element by element product of Boost Ublas Matrix and found that element_prod was way slower than if I wrote my own implementation with simple for loops. So, I decided to write my own version.

I am trying to achieve code that will do element by element matrix multiplication with the help of a statement as followed :

matrix m1, m2, m3;
m3 = m1 * m2;

Here, I would like to make use of C++11 move semantics with regards to efficiently returning the output of the multiplication.

This is what I have so far.

#include "boost\numeric\ublas\matrix.hpp"
#include <Windows.h>
typedef boost::numeric::ublas::matrix<float> matrix;
void ElemProd();
const size_t X_SIZE = 400;
const size_t Y_SIZE = 400;
const size_t ITERATIONS = 500;
matrix operator*(const matrix &m1, const matrix &m2)
{
    size_t rows = m1.size1();
    size_t cols = m2.size2();
    matrix temp(rows,cols);
    for (size_t i = 0; i < rows; i++)
    {
        for (size_t j = 0; j < cols; j++)
        {
            temp(i, j) = m1(i, j) * m2(i, j);
        }
    }

    //return std::move(temp);
    return temp;
 }

void ElemProd()
{
     matrix m1(X_SIZE, Y_SIZE);
     matrix m2(X_SIZE, Y_SIZE);
     for (size_t i = 0; i < X_SIZE; i++)
     {
         for (size_t j = 0; j < Y_SIZE; j++)
         {
             m1(i, j) = 2;
             m2(i, j) = 10;
         }
     }

         matrix m3 = m1; // simply to allocate the right amount of memory for m3, to be overwritten.
         m3 = m1 * m2;
   }

Here, in the operator* overload, I had to create a temp matrix to store the result of the calculation. I think this is adding a significant overhead. Any suggestions how to work around this?

Another option is to make the arguments to the overload as not const, and overwrite one of the matrices and return it, but I think this is very risky long term, I would prefer to avoid it.

Consider the case where I want something like this :

matrix m = m1 * m2 * m3 * m4 * m5 * m6;

Here, you can see that I am allocating memory for temp 6 times in my implementation. m should only have to be allocated once. Further allocations are simply overhead.

The Vivandiere
  • 3,059
  • 3
  • 28
  • 50
  • 1
    Your timing loop is very flawed. An optimising compiler may remove the loop contents entirely (and why would you be timing a debug build?). Surely, just returning `temp` would allow RVO? – Skizz Jul 01 '14 at 08:24
  • @Skizz, I removed the timing part. Do you suspect that "matrix temp = m1" statement will add overhead? If so, do you have any suggestions to work around it? – The Vivandiere Jul 01 '14 at 08:37
  • 1
    The `matrix temp=m1;` is incorrect as there's no need to copy m1 as the contents of `temp` are overwritten in the loop, assuming that the copy constructor does indeed copy the elements, I'm not familiar with the boost matrix. You just need to initialise the size of it, there is no need to initialise the values in the matrix. And the end of the function should be `return temp;` to allow for RVO. – Skizz Jul 01 '14 at 08:48
  • @Skizz, isn't there a language feature that allows me to complete the operation without involving temp. I assume that I am accumulating overhead while temp is being allocated. – The Vivandiere Jul 01 '14 at 08:53
  • 1
    Yes, it's called RVO (Return Value Optimisation). And to enable it, just do `return temp;` and leave the rest as is. The compiler will see that you're returning the object by value and do a bit of spooky magic to pass a pointer to `m3` in your case to the `operator*` function and use this pointer in place of `temp`. It is worth checking the compiler has done this just to be sure (by adding logging to the constructor). The upshot being that there is no moving or copying of `temp` when the `operator*` function ends. – Skizz Jul 01 '14 at 08:58
  • @Skizz, I modified my question a little. As per the current implementation, it seems to me that I am allocating memory needlessly when I instantiate temp. The memory to store the result of the operation has already been allocated by the 'matrix m3 = m1' statement. When I allocate memory for temp, that is just overhead. – The Vivandiere Jul 01 '14 at 09:15
  • To readers, for the record, my question is not yet answered. Please help me out if you can. – The Vivandiere Jul 02 '14 at 15:37
  • If your goal is to minimise the number of temporaries in composite operations, then google "expression templates". Because expression templates, in their full generality, are a tricky wheel to reinvent, here are some specific references: (1) "The C++ Programming Language", Section 29.5.4 (2) "C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond", Section 10.5, (3) "C++ Templates: The complete guide", Chapter 18. – Drake Jul 03 '14 at 18:57

0 Answers0