1

I've a big matrix/2D array for which every possible column-pair I need to find the coherence by parallel computation in python (e.g. mpi4py). Coherence [a function] are computed at various child processes and the child process should send the coherence value to the parent process that gather the coherence value as a list. To do this, I've created a small matrix and list of all possible column pairs as follows:

import numpy as np
from scipy import signal
from itertools import combinations
from mpi4py import MPI


comm = MPI.COMM_WORLD
nproc = comm.Get_size()
rank = comm.Get_rank()

data=np.arange(20).reshape(5, 4)
#List of all possible column pairs
data_col = list(combinations(np.transpose(data), 2)) #list

# Function creation
def myFunc(X,Y):
..................
..................
    return Real_coh

if rank==0:
    Data= comm.scatter(data_col,root=0) #col_pair

Can anyone suggest me how to proceed further. You are welcome to ask any questions/clarifications. Expecting your cordial help. Thanks

pluto
  • 88
  • 8

1 Answers1

1

check out the following scripts [with comm.Barrier for sync. communication]. In the script, I've written and read the files as a chunk of h5py dataset which is memory efficient.

import numpy as np
from scipy import signal
from mpi4py import MPI
import h5py as t

chunk_len = 5000 # No. of rows of a matrix
num_c = 34    # No. of column of the matrix

# Actual Dataset
data_mat = np.random.random((10000, num_c))

shape = (chunk_len, data_mat.shape[1])
chunk_size = (chunk_len, 1)
no_of_chunks = data_mat.shape[1]

with t.File('file_name.h5', 'w') as hf:
    hf.create_dataset("chunked_arr",  data=data_mat, chunks=chunk_size, compression='lzf')
del data_mat

def myFunc(dset_X, dset_Y):
    ..............
    ............
    return Real_coh

res = np.zeros((num_c, num_c))

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

for i in range(num_c):
    with t.File('file_name.h5', 'r', libver='latest') as hf:
        dset_X = hf['chunked_arr'][:, i]  # Chunk data reading
    if i % size == rank:
        for j in range(num_c):
            with t.File('file_name.h5', 'r', libver='latest') as hf:
                dset_Y = hf['chunked_arr'][:, j] # Chunk data reading
            res[i][j] = spac(dset_X, dset_Y)
comm.Barrier()
print('Shape of final result :', res.shape )
Dharman
  • 30,962
  • 25
  • 85
  • 135
CEB
  • 106
  • 1
  • 11