I usually just see this called data redistribution, with the understanding being that if you're redistributing it you want the distribution to be optimal under some metric, like evenness between tasks.
This does come up in scientific/technical computing when you're trying to do computational load balancing. Even if you're doing computation in several dimensions, if you're redistributing spatial data that you assigning to processors by a space filling curve, this exact problem comes up, and there you often do want the data to be evenly divided.
The procedure is pretty straightforward; you start by taking an exclusive prefix sum of the xi so that you know how many items are to the "left" of you. Eg, for Noxville's example above, if you had data
[9, 6, 1, 6, 2]
the prefix sums would be
[0, 9, 15, 16, 22]
and you'd find (from the last processor's sum plus how many it has) that there are 24 items in total.
Then you figure out how big your ideal partitions would be - say, ceil(totitems / nprocs). You can do this however you like as long as every processor will agree on what all of the partition sizes will be.
Now, you have a few ways to proceed. If the data items are large in some sense and you can't have two copies of them in memory, then you can start shifting data to just your nearest neighbours. You know the number of items to your left and the "excess" or "deficit" in that direction; and you also know how many you have (and will have after you've done your part to fix the excess or deficit). So you start sending data to your left and right neighbor, and receive data from your left and right neighbour, until the processors leftward collectively have the right amount of items and you do as well.
But if you can afford to have two copies of the data, then you can take another approach which minimizes the number of messages sent. You can think of the number of cells to your left as the starting index of your local data into the "global" array. Since you know how many items each processor will end up with, you can figure out directly which process those items will end up at, and can send them directly. (For instance, in the example above, procesor 0 - which has items 0..8 - knows that if each processor but the last is going to end up with 5 data items, then values 5-8 can be sent to processor 1.) Once those are sent, you simply receive until you have the amount of data you're expecting; and you're done.
Below is a simple example of doing this in C and MPI, but the basic approach should work pretty much anywhere. MPI's prefix scan operation generates inclusive sums, so we have to subtract off our own number of values to get the exclusive sum:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
void initdata(const int rank, const int maxvals, char **data, int *nvals) {
time_t t;
unsigned seed;
t = time(NULL);
seed = (unsigned)(t * (rank + 1));
srand(seed);
*nvals = (rand() % (maxvals-1)) + 1;
*data = malloc((*nvals+1) * sizeof(char));
for (int i=0; i<*nvals; i++) {
(*data)[i] = 'A' + (rank % 26);
}
(*data)[*nvals] = '\0';
}
int assignrank(const int globalid, const int totvals, const int size) {
int nvalsperrank = (totvals + size - 1)/size;
return (globalid/nvalsperrank);
}
void redistribute(char **data, const int totvals, const int curvals, const int globalstart,
const int rank, const int size, int *newnvals) {
const int stag = 1;
int nvalsperrank = (totvals + size - 1)/size;
*newnvals = nvalsperrank;
if (rank == size-1) *newnvals = totvals - (size-1)*nvalsperrank;
char *newdata = malloc((*newnvals+1) * sizeof(char));
newdata[(*newnvals)] = '\0';
MPI_Request requests[curvals];
int nmsgs=0;
/* figure out whose data we have, redistribute it */
int start=0;
int newrank = assignrank(globalstart, totvals, size);
for (int val=1; val<curvals; val++) {
int nextrank = assignrank(globalstart+val, totvals, size);
if (nextrank != newrank) {
MPI_Isend(&((*data)[start]), (val-1)-start+1, MPI_CHAR, newrank, stag, MPI_COMM_WORLD, &(requests[nmsgs]));
nmsgs++;
start = val;
newrank = nextrank;
}
}
MPI_Isend(&((*data)[start]), curvals-start, MPI_CHAR, newrank, stag, MPI_COMM_WORLD, &(requests[nmsgs]));
nmsgs++;
/* now receive all of our data */
int newvalssofar= 0;
int count;
MPI_Status status;
while (newvalssofar != *newnvals) {
MPI_Recv(&(newdata[newvalssofar]), *newnvals - newvalssofar, MPI_CHAR, MPI_ANY_SOURCE, stag, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_CHAR, &count);
newvalssofar += count;
}
/* wait until all of our sends have been received */
MPI_Status statuses[curvals];
MPI_Waitall(nmsgs, requests, statuses);
/* now we can get rid of data and relace it with newdata */
free(*data);
*data = newdata;
}
int main(int argc, char **argv) {
const int maxvals=30;
int size, rank;
char *data;
int mycurnvals, mylvals, myfinalnvals;
int totvals;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
initdata(rank, maxvals, &data, &mycurnvals);
MPI_Scan( &mycurnvals, &mylvals, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD );
if (rank == size-1) totvals = mylvals;
mylvals -= mycurnvals;
MPI_Bcast( &totvals, 1, MPI_INT, size-1, MPI_COMM_WORLD );
printf("%3d : %s %d\n", rank, data, mylvals);
redistribute(&data, totvals, mycurnvals, mylvals, rank, size, &myfinalnvals);
printf("%3d after: %s\n", rank, data);
free(data);
MPI_Finalize();
return 0;
}
Running this you get the expected behaviour; note that the way I've determined the "desired" partitioning (using ceil(totvals/nprocesses)) the final processor will generally be under-loaded. Also, I've not made any attempt to ensure that order is preserved in the redistribution (although that's easy enough to do if order is important):
$ mpirun -np 13 ./distribute
0 : AAAAAAAAAAA 0
1 : BBBBBBBBBBBB 11
2 : CCCCCCCCCCCCCCCCCCCCCCCCCC 23
3 : DDDDDDD 49
4 : EEEEEEEEE 56
5 : FFFFFFFFFFFFFFFFFF 65
6 : G 83
7 : HHHHHHH 84
8 : IIIIIIIIIIIIIIIIIIIII 91
9 : JJJJJJJJJJJJJJJJJJ 112
10 : KKKKKKKKKKKKKKKKKKKK 130
11 : LLLLLLLLLLLLLLLLLLLLLLLLLLLL 150
12 : MMMMMMMMMMMMMMMMMM 178
0 after: AAAAAAAAAAABBBBB
1 after: BBBBBBBCCCCCCCCC
2 after: CCCCCCCCCCCCCCCC
3 after: DDDDDDDCEEEEEEEE
4 after: EFFFFFFFFFFFFFFF
5 after: FFFHHHHHHHIIIIIG
6 after: IIIIIIIIIIIIIIII
7 after: JJJJJJJJJJJJJJJJ
8 after: JJKKKKKKKKKKKKKK
9 after: LLLLLLLLLLKKKKKK
10 after: LLLLLLLLLLLLLLLL
11 after: LLMMMMMMMMMMMMMM
12 after: MMMM