CUDA Sort Shared Memory

Asked Feb 24 '13 at 07:56

Active Feb 24 '13 at 07:56

Viewed 548 times

I want sort an array in shared memory parallelly without exiting the kernel.

I can sort an array in global memory using Thrust for CUDA . But that can done be done only in the host . I would have to exit the kernel for it. but it would mean that i would lose all the local memory in my thread when i relaunch another kernel i would have to refill the local memory .

Are there any libraries to this ? Or is there anyway i would pass the kernel and come to host and use thrust to sort the array in device and then resume the kernel ?

asked Feb 24 '13 at 07:56

AbrahamDaniel

A library call would normally imply something that was done from the host, so it would not have knowledge of whatever the contents of shared memory were previously. An ordinary (non-cuda-dynamic-parallelism) *device-callable* library would not be able to spawn multiple threads to do anything in parallel. But there are a variety of sorting examples in the cuda samples, in particular the [sortingNetworks sample](http://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-sorting-networks) includes an oddEvenMergeSortShared kernel that might be adaptable to sort your existing shared memory data. – Robert Crovella Feb 24 '13 at 08:16
How large is your shared memory array? – talonmies Feb 24 '13 at 11:27
my program at the max might use about 25 KB .. – AbrahamDaniel Feb 24 '13 at 11:46
Shared memory is a per block resource. So you are saying that each block will use 25kb of shared memory? – talonmies Feb 24 '13 at 12:22
@talonmies yes ... each bloack might use about 25 kb , i m using a fermi card and i read it allows a max of 48 KB per block – AbrahamDaniel Feb 24 '13 at 13:19

CUDA Sort Shared Memory

0 Answers0