Distributed Memory Top-K Algorithm for Large K

Question

I have a distributed array of rankings, of total size N irregularly distributed amongst NP processors, from which I need to extract the K largest elements. In the limit that K << N, K is smaller than any of the local buffer lengths, and K is relatively small in general (such that it can e.g. fit in reasonable MPI buffers), the following algorithm seems to work well

Perform a local top-K search to determine the largest K values in each local array segment
Perform a custom Allreduce which performs binary top-K reductions between buffers of size K coming from different processes.

This can be done in a semi-communication-optimal way given the communication patterns underlying MPI_Allreduce.

I'm unclear how this can be done efficiently without the above assumptions about the size of K relative to N and the local buffer sizes. In particular, I'm trying to determine an optimal (or reasonably scaling) algorithm that is compatible with the following:

K can be larger than some or all of the local buffer dimensions
K can be so large is to be impractical to communicate entirely (e.g. trying to determine the top billion elements of a 10 billon element array)

The full array nor the top-K elements need to be sorted on completion.

For arrays which reside on a single processing element, the following questions are related:

Collect top K elements from multiple sorted arrays

Average time complexity of finding top-k elements

Optimal algorithm for returning top k values from an array of length N

LazySelect should work for this. I posted [a serial implementation](https://stackoverflow.com/a/75258547/2144669) recently, but the parts that operate on the full array would parallelize. — David Eisenstat, Feb 03 '23 at 02:24

score 0 · Answer 1 · answered Feb 03 '23 at 03:37

The key here is to find the top-K repeatedly, where each time you make K the largest value satisfying the original assumption.

Assuming you'd want to get top-K' where K' is greater than the local buffer and MPI communication sizes. You could do the following to eventually find top-K':

Find top-K where K is the greatest value that fits inside local buffer and can be communicated with MPI.
Append the K elements to the top array A, and remove them from the local arrays.
Go back to #1 above until size(A) == K'.

The resulting array A should contain the top-K' elements.

Distributed Memory Top-K Algorithm for Large K

1 Answers1