I have an array of n items of type T, and a categorization function f(t) that assigns to each item a category number, from O to k-1. (k being the number of categories). The goal is to divide the array into k segments, one for each category, and rearrange the items so that they are all in the right segment.
With two different arrays for input and output, I could do it in O(n), but I need to do it in-place (i.e. using swaps as basic operation), and if possible, using a parallelizable algorithm.
One idea would be to do one segment after the other (first swapping all 0's onto a segment at the beginning [O, i0], then all 1's (starting after i0) to a new segment after that, etc). This would be O(n * k) (with n getting smaller), but is not parallelizable.
Another way would be to use a sorting algorithm in O(n log n) that may be parallelizable, but this is likely not optimal because most items compare as equal.
My question is what would be a good approach for this problem, and how this problem would be called in literature?