0

I would like to transform values and sort them in one go, like this:

thrust::vector<int> dataIn  = ...
thrust::vector<int> dataOut = ...
auto iterIn = cub::TransformInputIterator<int, Ftor, int*>(dataIn.begin(), Ftor());
cub::DeviceRadixSort::SortKeys(dTemp, tempBytes, iterIn, dataOut.begin(), numElems);

However, SortKeys requires raw pointers instead of the iterators. Is it possible to make this work using iterators nonetheless? I know this is possible with thrust, but I want to use CUB.

Thanks for the suggestions.

hrvthzs
  • 83
  • 9
  • 1
    You do realize that Thrust uses CUB internally for its sort implementations? – talonmies Sep 05 '18 at 04:50
  • @talonmies Yes I know, and I have two reasons not to use thrust. First, with thrust I cannot preallocate and reuse temporary storage that is crucial for me. Second, CUB alone is faster, which is probably also a consequence of the first one. – hrvthzs Sep 05 '18 at 05:57
  • 2
    You can preallocate and reuse temporary storage with thrust. See [here](https://stackoverflow.com/questions/48670284/cuda9-thrust-sort-by-key-overlayed-with-h2d-copy-using-streams). – Robert Crovella Sep 05 '18 at 21:58
  • @RobertCrovella thank you very much for the suggestion. Plottings on the CUB Github show significant differences between corresponding CUB and Thrust calls. Do you know why is Thrust slower? I assume the benchmark was done using a cached allocator. – hrvthzs Sep 06 '18 at 15:24
  • I'm not sure which "plottings" you're referring to, but they probably date back to a point in time where thrust was not using CUB under the hood. Thrust predated CUB, so it was a natural thing to benchmark against when CUB appeared. I would be surprised if thrust is much slower than CUB for an apples-to-apples sorting comparison (which would probably include removing the time spent allocating temporary buffers.) – Robert Crovella Sep 07 '18 at 04:26
  • I am referring to these plots: https://nvlabs.github.io/cub/structcub_1_1_device_partition.html – hrvthzs Sep 07 '18 at 08:21
  • 1
    Those are against an older version (1.7.1) of thrust that was not using cub. If you want a current performance comparison, you should probably compare things as they are today. If you look at the thrust [changelog](https://github.com/thrust/thrust/blob/master/CHANGELOG) you'll see a new sort implementation introduced in 1.8.0, credited to Duane Merrill, the author of CUB. – Robert Crovella Sep 07 '18 at 13:59
  • I see, thank you very much for the clarification – hrvthzs Sep 08 '18 at 17:00

1 Answers1

2

Sorry to disappoint, but AFAIK CUB doesn't support this. It could, theoretically, with deeper templatization, but it doesn't.

You could lift the code from within cub, or modify the code with an extra template parameter. That would be a headache, but it is doable if all you want to do is pass the input values through some transformation with a device-side function.

einpoklum
  • 118,144
  • 57
  • 340
  • 684