Questions tagged [thrust]

Thrust is a template library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL) for NVIDIA CUDA.

Thrust is a library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL) which interoperates with technologies such as CUDA, OpenMP, and TBB. Thrust provides a flexible high-level interface for parallel programming designed to greatly simplify running common data parallel processing tasks on GPUs and multicore CPUs, with the intention of enhancing developer productivity.

As of CUDA release 4.0, a Thrust snapshot is included in every release of the the standard CUDA toolkit distribution. Under Debian and Ubuntu Thrust may also be installed through apt-get:

sudo apt-get install libthrust-dev

The project homepage contains documentation and sample code demonstrating usage of the library. The latest source, bug-reports and discussions are always available on GitHub.

The rocThrust project (GitHub) by AMD is a port of Thrust to their HIP/ROCm platform.

959 questions
7
votes
4 answers

Generating random numbers with uniform distribution using Thrust

I need to generate a vector with random numbers between 0.0 and 1.0 using Thrust. The only documented example I could find produces very large random numbers (thrust::generate(myvector.begin(), myvector.end(), rand). I'm sure the answer is simple,…
user1701865
  • 79
  • 1
  • 4
7
votes
1 answer

Array of vectors using Thrust

Is it possible to create an array of device_vectors using Thrust? I know I can't create a device_vector of a device_vector, but how I would create an array of device_vectors?
Manolete
  • 3,431
  • 7
  • 54
  • 92
7
votes
1 answer

Can I use thrust::host_vector or I must use cudaHostAlloc for zero-copy with Thrust?

I want to use zero-copy on mapped memory by cudaHostGetDevicePointer. Can I use thrust::host_vector or I must use cudaHostAlloc(...,cudaHostAllocMapped)? Or is it somehow easier to do with Thrust?
Alex
  • 12,578
  • 15
  • 99
  • 195
7
votes
2 answers

How good is OpenCV GPU library for matrix operations?

I'm using OpenCV for an application in computer vision. I'd like to accelerate some matrix operations (matrices are fairly large) on GPU and want to avoid coding directly in CUDA C, if possible. OpenCV 2.4.1 has a number of GPU accelerated…
Alexey
  • 5,898
  • 9
  • 44
  • 81
6
votes
2 answers

How to decrement each element of a device_vector by a constant?

I'm trying to use thrust::transform to decrement a constant value from each element of a device_vector. As you can see, the last line is incomplete. I'm trying to decrement from all elements the constant fLowestVal but dont know how…
igal k
  • 1,883
  • 2
  • 28
  • 57
6
votes
1 answer

Thrust Static Assertion when using in cpp files

I am trying to compile and run a simple Cuda/thrust program, it works when the extension is .cu but it fails when the extension of source is .cpp. I already applied the required changes for cpp file in cmake but I am getting error: static…
AMCoded
  • 1,374
  • 2
  • 24
  • 39
6
votes
1 answer

What is the difference between thrust::host_vector and std::vector?

Both allocate memory on the host and I can copy contents to device_vector and back using iterators. Why was host_vector necessary to include in the API? Does it have something to do with pinned memory?
Souradeep Nanda
  • 3,116
  • 2
  • 30
  • 44
6
votes
1 answer

Using thrust with printf / cout

I'm trying to learn how to use CUDA with thrust and I have seen some piece of code where the printf function seems to be used from the device. Consider this code: #include #include #include…
bct
  • 285
  • 2
  • 11
6
votes
3 answers

Strided reduction by CUDA Thrust

I have an array of vertices with this kind of structure: [x0, y0, z0, empty float, x1, y1, z1, empty float, x2, y2, z2, empty float, ...] I need to find minX, minY, minZ, maxX, maxY and maxZ using CUDA. I wrote a proper reduction algorithm, but it…
aerion
  • 702
  • 1
  • 11
  • 28
6
votes
2 answers

Poor performance when calling cudaMalloc with 2 GPUs simultaneously

I have an application where I split the processing load among the GPUs on a user's system. Basically, there is CPU thread per GPU that initiates a GPU processing interval when triggered periodically by the main application thread. Consider the…
rmccabe3701
  • 1,418
  • 13
  • 31
6
votes
1 answer

How to debug cuda thrust functions in visual studio 2010 with parallel nsight

I am using visual studio 2010, parallel nsight 2.2 and cuda 4.2 for learning. My system is Windows 8 pro x64. I opened the radix sort project which included by cuda computing SDK in VS, and compiled it with no error. The sort code uses thrust…
Miles Xu
  • 61
  • 1
  • 4
6
votes
3 answers

How to generate random permutations with CUDA

What parallel algorithms could I use to generate random permutations from a given set? Especially proposals or links to papers suitable for CUDA would be helpful. A sequential version of this would be the Fisher-Yates shuffle. Example: Let S={1, 2,…
diver_182
  • 261
  • 6
  • 13
6
votes
3 answers

CUDA Thrust slow when operating large vectors on my machine

I'm a CUDA beginner and reading on some thrust tutorials.I write a simple but terribly organized code and try to figure out the acceleration of thrust.(is this idea correct?). I try to add two vectors (with 10000000 int) to another vector, by adding…
Tony
  • 127
  • 1
  • 6
6
votes
1 answer

push_back using Thrust library

Is it possible to use push_back with Thrust library? and what about a vector of vectors? I would like to use in the GPU what in the CPu is: vector< vector > MyVector( 100 ); ... MyVector[i].push_back(j); Is there a way to use it like for…
Manolete
  • 3,431
  • 7
  • 54
  • 92
6
votes
2 answers

Thrust vectorized search: Efficiently combine lower_bound and binary_search to find both position and existence

I'm trying to use Thrust to detect if each element of an array can be found in another array and where (both arrays are sorted). I came across the vectorized search routines (lower_bound and binary_search). lower_bound will return for each value the…
tat0
  • 133
  • 1
  • 5
1 2
3
63 64