I have a problem printing a sparse matrix in a c++/mpi program that I hope you could help me solve.
Problem: I need to print a sparse matrix as a list of 3-ples (x, y, v_xy) in a .txt file in a program that has been parallelized with MPI. Since I am new to MPI, I decided not to deal with the parallelized IO instructions provided by the library and let the master processor (0 in my case) print the output. However, the time for printing the matrix increases when I increase the number of processors:
- 1 processor: 11,7 secs
- 2 processors: 26,4 secs
- 4 processors: 25,4 secs
I have already verified that the output is exactly the same in the three cases. Here is the relevant section of the code:
if (rank == 0)
{
sw.start();
std::ofstream ofs_output(output_file);
targets.print(ofs_output);
ofs_output.close();
sw.stop();
time_output = sw.get_duration();
std::cout << time_output << std::endl;
}
My stopwatch sw is measuring wall clock time using the gettimeofday
function.
The print
method for the targets matrix is the following:
void sparse_matrix::print(std::ofstream &ofs)
{
int temp_row;
for (const_iterator iter_row = _matrix.begin(); iter_row != _matrix.end(); ++iter_row)
{
temp_row = (*iter_row).get_key();
for (value_type::const_iterator iter_col = (*iter_row).get_value().begin();
iter_col != (*iter_row).get_value().end(); ++iter_col)
{
ofs << temp_row << "," << (*iter_col).get_key() << "," << (*iter_col).get_value() << std::endl;
}
}
}
I do not understand what is causing the slow-down since only processor 0 does the output and this is the very last operation of the program: all the other processors are done while processor 0 prints the output. Do you have any idea?