1

I want sort an array in shared memory parallelly without exiting the kernel.

I can sort an array in global memory using Thrust for CUDA . But that can done be done only in the host . I would have to exit the kernel for it. but it would mean that i would lose all the local memory in my thread when i relaunch another kernel i would have to refill the local memory .

Are there any libraries to this ? Or is there anyway i would pass the kernel and come to host and use thrust to sort the array in device and then resume the kernel ?

AbrahamDaniel
  • 569
  • 2
  • 8
  • 19
  • A library call would normally imply something that was done from the host, so it would not have knowledge of whatever the contents of shared memory were previously. An ordinary (non-cuda-dynamic-parallelism) *device-callable* library would not be able to spawn multiple threads to do anything in parallel. But there are a variety of sorting examples in the cuda samples, in particular the [sortingNetworks sample](http://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-sorting-networks) includes an oddEvenMergeSortShared kernel that might be adaptable to sort your existing shared memory data. – Robert Crovella Feb 24 '13 at 08:16
  • How large is your shared memory array? – talonmies Feb 24 '13 at 11:27
  • my program at the max might use about 25 KB .. – AbrahamDaniel Feb 24 '13 at 11:46
  • Shared memory is a per block resource. So you are saying that each block will use 25kb of shared memory? – talonmies Feb 24 '13 at 12:22
  • @talonmies yes ... each bloack might use about 25 kb , i m using a fermi card and i read it allows a max of 48 KB per block – AbrahamDaniel Feb 24 '13 at 13:19

0 Answers0