Does Divide an Conquer always get better performance?

Question

I'm currently testing some divide and conquer algorithms versus their normal implementations. I'm quite new at this and I'm not sure if I should always get a better performance when using divide and conquer. For example, I've implemented an algorithm to transpose a matrix conventionally and using divide an conquer, but I still get better performance using the first version. Could it be possible or am I missing something important?

Here's the code using divide and conquer

void trasponer_DyV(Matriz &matriz)
{
    if (matriz.size() >= 2)
    {
        trasponer_DyV(matriz, 0, matriz.size(), 0, matriz.size());
    }
}

void trasponer_DyV(Matriz &matriz, int fil_inicio, int fil_fin, int col_inicio, int col_fin)
{
    int tam = fil_fin - fil_inicio;

    if (tam == 1)
        return;

    trasponer_DyV(matriz,fil_inicio, fil_inicio + tam / 2,col_inicio, col_inicio + tam / 2);
    trasponer_DyV(matriz, fil_inicio, fil_inicio + tam / 2, col_inicio + tam / 2, col_inicio + tam);
    trasponer_DyV(matriz, fil_inicio + tam / 2, fil_inicio + tam, col_inicio, col_inicio + tam / 2);
    trasponer_DyV(matriz, fil_inicio + tam / 2, fil_inicio + tam, col_inicio + tam / 2, col_inicio + tam);

    for (int i = 0; i < tam / 2; i++)
    {
        for (int j = 0; j < tam / 2; j++)
            swap(matriz[fil_inicio + i][col_inicio + tam / 2 + j], matriz[fil_inicio + tam / 2 + i][col_inicio + j]);
    }
}

And here is the brute-force one:

Matriz trasponer_fuerzabruta(const Matriz &matriz)
{
    Matriz ret;
    ret.resize(matriz.size());
    for (int i = 0; i < matriz.size(); ++i)
    {
        ret[i].resize(matriz.size());
    }

    // Todo lo que hacemos es sustituir filas por columnas.
    for (int fila = 0; fila < matriz.size(); ++fila)
    {
        for (int columna = 0; columna < matriz.size(); ++columna)
        {
            ret[columna][fila] = matriz[fila][columna];
        }
    }

    return ret;
}

Thanks in advance!

To compare performance between two algorithms, we need to see the code for both algorithms. — Jerry Jeremiah, Mar 31 '20 at 21:10

score 1 · Accepted Answer · answered Mar 31 '20 at 21:45

1

The first version is doing more work - it transposes fragments in-place, then swaps them into the right place.

The second version transposes one element at a time, but does so already to the final place.

Furthermore, in a sequential process, divide & conquer is only beneficial when the working set won't fit in the L3 cache (8MB or more), which equates to a matrix of size >1000*1000.

Though parallelizing it (at CPU level) will also not be beneficial since a matrix transpose is an entirely DRAM-bound operation.

answered Mar 31 '20 at 21:45

rustyx

80,671
25
200
267

What can I do to optimize the first version? – Adrisui3 Mar 31 '20 at 21:48
1

Think about how to transpose fragments pair-wise in a single pass. Consider the diagonal symmetry of the matrix. – rustyx Mar 31 '20 at 21:52
It took some time, but I saw what you meant. Thank you so much, it's been really helpful! – Adrisui3 Apr 05 '20 at 19:13

score 0 · Answer 2 · answered Mar 31 '20 at 21:11

0

The first function is expected to be more performant since it does not make any additional function calls, which are not free of cost.

IMHO, you'd use divide and conquer if:

You are able to use multiple processors in parallel -- using threads or a MPI-like environment, or
Readability of the function is improved (which results in enhances maintainability), or
A higher level algorithm can be conceptually divided into smaller, potentially reusable, functions.

answered Mar 31 '20 at 21:11

R Sahu

204,454
14
159
270

I know all the theory, but I can't see the reason why Divide and Conquer es not performing better than the base algorithm :( – Adrisui3 Mar 31 '20 at 21:45

Does Divide an Conquer always get better performance?

2 Answers2