For using data parallel algorithms on the GPU with CUDA there are two standard libraries, CUDPP and Thrust, which implement sorting, reduction , prefix sum etc.
So what are the main differences between the libraries, in terms of performance and features ?