There are at least two parts to your question. The following sections address these two component pieces, but leave integration of the two to you. The example code contained below in both sections, along with explanations provided in the ScaLapack link below should provide some guidance...
From DeinoMPI:
The following sample code illustrates MPI_Type_create_subarray.
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int myrank;
MPI_Status status;
MPI_Datatype subarray;
int array[9] = { -1, 1, 2, 3, -2, -3, -4, -5, -6 };
int array_size[] = {9};
int array_subsize[] = {3};
int array_start[] = {1};
int i;
MPI_Init(&argc, &argv);
/* Create a subarray datatype */
MPI_Type_create_subarray(1, array_size, array_subsize, array_start, MPI_ORDER_C, MPI_INT, &subarray);
MPI_Type_commit(&subarray);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0)
{
MPI_Send(array, 1, subarray, 1, 123, MPI_COMM_WORLD);
}
else if (myrank == 1)
{
for (i=0; i<9; i++)
array[i] = 0;
MPI_Recv(array, 1, subarray, 0, 123, MPI_COMM_WORLD, &status);
for (i=0; i<9; i++)
printf("array[%d] = %d\n", i, array[i]);
fflush(stdout);
}
MPI_Finalize();
return 0;
}
And from ScaLapack in C essentials:
Unfortunately, there is no C interface for ScaLAPACK or PBLAS.All
parametersshould be passed into routines and functionsby reference,
you can also define constants (i_one for 1, i_negone for -1, d_two for
2.0E+0 etc.) to pass into routines.Matrices should bestoredas 1d array(A[ i + lda*j ], not A[i][j])
To invoke ScaLAPACK routines in your program, you should first
initialize grid via BLACS routines (BLACS is enough). Second, you
should distribute your matrix over process grid (block cyclic 2d
distribution). You can do this by means of pdgeadd_ PBLAS routine.
This routine cumputes sum of two matrices A, B: B:=alphaA+betaB).
Matrices can have different distribution,in particularmatrixA can be
owned by only one process, thus, setting alpha=1, beta=0 you cansimply
copy your non-distributed matrix A into distributed matrix B.
Third, call pdgeqrf_ for matrix B. In the end of ScaLAPACK part of
code, you can collect results on one process (just copy distributed
matrix into local one via pdgeadd_). Finally, close grid via
blacs_gridexit_ and blacs_exit_.
After all, ScaLAPACK-using program should contain following:
void main(){
// Useful constants
const int i_one = 1, i_negone = -1, i_zero = 0;
const double zero=0.0E+0, one=1.0E+0;
... (See the rest of code in linked location above...)