Questions tagged [cub]

CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model.

CUB (CUDA UnBound) is a C++ template library of components for use on NVIDIA GPUs running CUDA.

CUB includes common data parallel operations such as prefix scan, reduction, histogram and sort. CUB's collective primitives are not bound to any particular width of parallelism or to any particular data type and can be used at device, block, warp or thread scope.

It is used in the backend of other NVIDIA libraries, most prominently Thrust and RAPIDS.

CUB is developed by NVIDIA Research and it's website and documentation is hosted at https://nvlabs.github.io/cub with the most recent source code being available on GitHub. It is also distributed with the CUDA Toolkit since at least CUDA 11.1.1 (first version where CUB documentation is linked from CUDA Tookit documentation).

48 questions
2
votes
2 answers

CUB select if with returned indexes

I have recently been running into performance issues when using the Thrust library. These come from thrust allocating memory in the base of a large nested loop structure. This is obviously unwanted, with ideal execution using a pre-allocated slab of…
ebarr
  • 7,704
  • 1
  • 29
  • 40
2
votes
1 answer

Making CUB blockradixsort on-chip entirely?

I am reading the CUB documentations and examples: #include // or equivalently __global__ void ExampleKernel(...) { // Specialize BlockRadixSort for 128 threads owning 4 integer items…
yidiyidawu
  • 303
  • 1
  • 3
  • 12
2
votes
2 answers

Reduction in CUDA

I'm just starting to learn CUDA programming, and I have some confusion about reduction. I know that global memory has much visiting delay as compared to shared memory, but can I use global memory to (at least) simulate a behavior similar to shared…
Narusaki
  • 97
  • 1
  • 2
  • 11
1
vote
1 answer

Getting total execution time of all kernels on a CUDA stream

I know how to time the execution of one CUDA kernel using CUDA events, which is great for simple cases. But in the real world, an algorithm is often made up of a series of kernels (CUB::DeviceRadixSort algorithms, for example, launch many kernels to…
Baxissimo
  • 2,629
  • 2
  • 25
  • 23
1
vote
1 answer

cub::DeviceRadixSort fails when specifying end bit

I am using the GPU radix sort algorithm of the CUB library to sort N 32-bit unsigned integers whose values all utilize only k of their 32 bits, starting from the least significant bit. Thus, I specify the bit subrange [begin_bit, end_bit) when…
huzzm
  • 489
  • 9
  • 24
1
vote
1 answer

How to use cub::DeviceReduce::ArgMin()

I am having some confusions about how to use the cub::DeviceReduce::ArgMin(). Here I copy the code from the documentation of CUB. #include // Declare, allocate, and initialize device-accessible pointers for input and output int …
Kurt
  • 11
  • 1
1
vote
1 answer

May I use CUDA CUB iterator instead of thrust?

Is it possible to use iterators with CUB like Thrust? I want t use CUB instead of thrust as follow: __global__ void reduce_roster(thrust::device_vector::iterator vect, float * tab_seq, int SEUIL_ROSTER) { int tid = blockIdx.x * blockDim.x…
Driss DS Idrissi
  • 77
  • 1
  • 1
  • 7
1
vote
1 answer

CUB reduction using 2D grid of blocks

I'm trying to make a sum using the CUB reduction method. The big problem is: I'm not sure how to return the values of each block to the Host when using 2-dimensional grids. #include #include #include…
1
vote
1 answer

fatal error: cub/cub.cuh: No such file or directory

I am new to CUDA and CUB. I found the following code and tried to compile it, but I had this error: fatal error: cub/cub.cuh: No such file or directory. The version of CUDA is 7.0.27 How I can fix this error? Thanks! #include #include…
S.M.K
  • 11
  • 1
  • 2
1
vote
1 answer

Including the CUB header triggers many Visual Studio Intellisense errors

Whenever i include header file, visual studio's IntelliSense reports thousands of errors. As you can see in the attached screenshot, application consists of empty main() function and a include line. I have defined additional include…
PatrykB
  • 1,579
  • 1
  • 15
  • 24
1
vote
1 answer

How to use CUB and Thrust in one CUDA code

I'm trying to introduce some CUB into my "old" Thrust code, and so have started with a small example to compare thrust::reduce_by_key with cub::DeviceReduce::ReduceByKey, both applied to thrust::device_vectors. The thrust part of the code is fine,…
1
vote
1 answer

cub BlockRadixSort: how to deal with large tile size or sort multiple tiles?

When using cub::BlockRadixSort to do the sorting within a block, if the number of elements is too large, how do we deal with that? If we set a tile size to be too large, the shared memory for the temporary storage will soon not able to hold it. If…
shaoyl85
  • 1,854
  • 18
  • 30
1
vote
1 answer

reduction example using cuda and CUB

I'm trying to get my head around CUB, and having a bit of trouble following the (rather incomplete) worked examples. CUB looks like it is a fantastic tool, I just can't make sense of the example code. I've built a simple proto-warp reduce…
user2462730
  • 171
  • 1
  • 10
0
votes
0 answers

Where can I locate taconomy.txt file for CUB Dataset

I'm working on some replication and I need to use taxonomy.txt file for CUB Dataset. Can anyone point me to the repo of this file? Thank you.
Jay
  • 1
  • 1
0
votes
1 answer

CUB device scan with custom scan op fails

I am using CUB::InclusiveScan which takes a custom binary, non-commutative, operator. When defining my template struct MultAddFunctor { const T factor; MultAddFunctor(T factor) : factor(factor) {} __device__…
coderforlife
  • 1,378
  • 18
  • 31