0

I have problem with broadcasting 2d double array representing 137x137 matrix using MPI_Bcast for more then two processors. Program is written in C and OpenMPI is used. Here is what I am doing:

...
double matrix[138][138] = {0.0};
int myid, numprocs;    

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);

if(myid==ROOT){
[reading matrix from file to matrix array]
}

MPI_Bcast(matrix, M_SIZE, MPI_DOUBLE, ROOT, MPI_COMM_WORLD);

[some operation i.e. print matrix]

There is no problem when executing for one or two processor, however for 3 and more program suspends. When I tried with one-dimensional array[138] there is no problem at all.

I will be grateful for any helpful information. Thank you!

user3084736
  • 103
  • 6
  • Could be an improper system configuration. What kind of setup are you running on? How many nodes are there? How many processes are running on each node. How are the nodes connected? If using Ethernet, how many network interfaces are up on each node? What are their network addresses? – Hristo Iliev Apr 05 '15 at 20:53
  • I am using faculty infrastructure and unfortunately I don't have specific knowledge about the configuration. We are using openMPI, there are 12 hosts running Ubuntu interconnected using Ethernet. I experimented more with my bug and it turned out that for small 2-dimensional arrays (i.e. [10][10]) everything is ok. When I try with array[138][138] the problem occurs for 3 and more nodes and sometimes (not always) I have the following exception: [ux1][[29593,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.122.1 failed: Connection refused (111) – user3084736 Apr 06 '15 at 18:09
  • 1
    There is a well-known peculiarity or Open MPI - see [here](http://stackoverflow.com/questions/22475816/unable-to-run-mpi-when-transfering-large-data). Basically, you have to find out your system's network configuration and provide Open MPI with the name(s) of the specific network interface(s) that can be used for communication. – Hristo Iliev Apr 06 '15 at 18:37
  • Thank you so much! Indeed - narrowing for one specific network interface solved the problem. Thanks again! ;) – user3084736 Apr 06 '15 at 19:57

0 Answers0