0

I'm working on a parallel matrix-matrix multiplier in MPI. I've got the calculation part working but I also want to calculate CPU time. I'm getting stuck because it looks like some processes are reporting start and end times of 0 and for a task that should take under a second (small matrices), the program reports 1000+ second CPU times (even though I know that it runs in under a second from observation). Here's what I'm currently doing:

#include <time.h>
#include "mpi.h"
// other includes
int main()
{
    int start, end, min_start, min_end;
    if (rank == 0)
    {
        // setup stuff

        start = clock();
        MPI_Reduce(&min_start, &start, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);

        // master computation stuff

        end = clock();
        MPI_Reduce(&max_end, &end, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);

        cout << "CPU time was " 
             << (double)(max_end - min_start) / CLOCKS_PER_SEC 
             << " seconds" << endl;
    }   
    else if (rank != 0)
    {
        // setup stuff

        start = clock();
        MPI_Reduce(&min_start, &start, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);

        // slave computation stuff

        end = clock();
        MPI_Reduce(&max_end, &end, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);
    }
}

I'm not sure what the source of the error is. When I added in this debugging output (after the if (rank == 0) and else if (rank != 0) statement)

MPI_Barrier(MPI_COMM_WORLD);
for (int i=0; i<size; i++)
{
    if (rank == i)
        cout << "(" << i << ") CPU time = " 
             << end << " - " << start 
             << " = " << end - start << endl;
    MPI_Barrier(MPI_COMM_WORLD);
}

I get the following output

CPU time was 1627.91 seconds
(1) CPU time = 0 - 0 = 0
(2) CPU time = 0 - 0 = 0
(0) CPU time = 1627938704 - 32637 = 1627906067
(3) CPU time = 10000 - 0 = 10000
RagingRoosevelt
  • 2,046
  • 19
  • 34
  • First of all, I wouldn't use `clock()` at all. You can use `chrono` with C++11 or its Boost implementation with pre C++11. – Daniel Langr Apr 06 '16 at 12:16

1 Answers1

1

First, man 3 clock says that "the clock() function returns an approximation of processor time used by the program". So to determine the time you do not need to compute the difference. This misconception is the source of the error. You just need to call it after your intensive computations and neglect the time consumed by setup stuff.

If you do not want to take setup time into account, then you really need the difference. So just use simple and robust MPI_Wtime function which gets precise number of seconds since a fixed moment in the past.

The value you are getting by subtraction minimal start time from maximal end time is not overall CPU time in generally accepted terms (i.e. in terms of time utility). That time is real time. To get indeed CPU time, you should sum up all processing times, i.e. call MPI_Reduce with time differences and MPI_SUM operation.

Sergey
  • 7,985
  • 4
  • 48
  • 80
  • Since `MPI_Wtime` gives the wall time rather than the CPU time then if other jobs are also running on the cluster, then the time reported will be increased, right? My goal is to be able to compute the speedup from serial execute to parallel execution. – RagingRoosevelt Apr 06 '16 at 15:24
  • 2
    @RagingRoosevelt Speedup from serial execution should be calculated as wall time on 1 machine compared to wall time on N machines. CPU time does not factor into that calculation – NoseKnowsAll Apr 06 '16 at 18:36
  • In addition to what @NoseKnowsAll has said, measuring CPU time is useless. Most MPI implementations spawn additional threads to process network requests and if any of them spins on polling for data, the overall CPU time as reported by `clock()` will sky-rocket. Besides, `clock()` is highly non-portable, e.g. it returns the real time on Windows. – Hristo Iliev Apr 06 '16 at 20:28
  • This is unfortunate to hear. My professor went on a long ramble about how we should use clock time rather than wall time because wall time racks up when other processes get scheduled and my job sits idling. To me, MPI_Wtime makes sense as well. – RagingRoosevelt Apr 06 '16 at 22:17