3

I'm currently testing some divide and conquer algorithms versus their normal implementations. I'm quite new at this and I'm not sure if I should always get a better performance when using divide and conquer. For example, I've implemented an algorithm to transpose a matrix conventionally and using divide an conquer, but I still get better performance using the first version. Could it be possible or am I missing something important?

Here's the code using divide and conquer

void trasponer_DyV(Matriz &matriz)
{
    if (matriz.size() >= 2)
    {
        trasponer_DyV(matriz, 0, matriz.size(), 0, matriz.size());
    }
}

void trasponer_DyV(Matriz &matriz, int fil_inicio, int fil_fin, int col_inicio, int col_fin)
{
    int tam = fil_fin - fil_inicio;

    if (tam == 1)
        return;

    trasponer_DyV(matriz,fil_inicio, fil_inicio + tam / 2,col_inicio, col_inicio + tam / 2);
    trasponer_DyV(matriz, fil_inicio, fil_inicio + tam / 2, col_inicio + tam / 2, col_inicio + tam);
    trasponer_DyV(matriz, fil_inicio + tam / 2, fil_inicio + tam, col_inicio, col_inicio + tam / 2);
    trasponer_DyV(matriz, fil_inicio + tam / 2, fil_inicio + tam, col_inicio + tam / 2, col_inicio + tam);

    for (int i = 0; i < tam / 2; i++)
    {
        for (int j = 0; j < tam / 2; j++)
            swap(matriz[fil_inicio + i][col_inicio + tam / 2 + j], matriz[fil_inicio + tam / 2 + i][col_inicio + j]);
    }
}

And here is the brute-force one:

Matriz trasponer_fuerzabruta(const Matriz &matriz)
{
    Matriz ret;
    ret.resize(matriz.size());
    for (int i = 0; i < matriz.size(); ++i)
    {
        ret[i].resize(matriz.size());
    }

    // Todo lo que hacemos es sustituir filas por columnas.
    for (int fila = 0; fila < matriz.size(); ++fila)
    {
        for (int columna = 0; columna < matriz.size(); ++columna)
        {
            ret[columna][fila] = matriz[fila][columna];
        }
    }

    return ret;
}

Thanks in advance!

Adrisui3
  • 94
  • 1
  • 7

2 Answers2

1

The first version is doing more work - it transposes fragments in-place, then swaps them into the right place.

The second version transposes one element at a time, but does so already to the final place.

Furthermore, in a sequential process, divide & conquer is only beneficial when the working set won't fit in the L3 cache (8MB or more), which equates to a matrix of size >1000*1000.

Though parallelizing it (at CPU level) will also not be beneficial since a matrix transpose is an entirely DRAM-bound operation.

rustyx
  • 80,671
  • 25
  • 200
  • 267
0

The first function is expected to be more performant since it does not make any additional function calls, which are not free of cost.

IMHO, you'd use divide and conquer if:

  1. You are able to use multiple processors in parallel -- using threads or a MPI-like environment, or

  2. Readability of the function is improved (which results in enhances maintainability), or

  3. A higher level algorithm can be conceptually divided into smaller, potentially reusable, functions.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • I know all the theory, but I can't see the reason why Divide and Conquer es not performing better than the base algorithm :( – Adrisui3 Mar 31 '20 at 21:45