Hi everyone I need to understand how to decompose an array to assign sub-blocks to a fixed number of processors. The case where the remainder among the number of elements% processes == 0 is simple, I would like to know a performing way to do it in case the remainder is different from 0. Maybe if it is possible to have a code example (in C using MPI) to better understand these wait. Furthermore, I would like to ask you which of:
- blockwise decomposition
- cyclic decomposition
- block cyclic decomposition
it is more efficient (assuming that sending and receiving data has a certain cost), and if there is still something faster for that purpose. Thank you all.