5

Whenever I call cudaMemPrefetchAsync() it returns the error code cudaErrorInvalidDevice. I am sure that I pass right device id (I have only one CUDA-capable GPU in my laptop under id == 0).

I believe that code sample posted below is error-free, but at line 52 (call to cudaMemPrefetchAsync()) I keep getting this error.


I tried:

  1. Clean driver installation. (Latest version)
  2. I check Google for an answer, but I could not find any. (I managed only to find this)

(I haven't idea for anything else)


System Spec:

OS: Microsoft windows 8.1 x64 home.
IDE: Visual studio 2015
CUDA toolkit: 8.0.61
NVIDIA GPU: GeForce GTX 960M
NVIDIA GPU driver: ver 381.65 (latest)
Compute Capability: 5.0 (Maxwell)
Unified Memory support: is supported.
Intel integrated gpu: Intel HD graphics 4600


Code Sample:

/////////////////////////////////////////////////////////////////////////////////////////////////////////
// TEST AREA:
// -- INCLUDE: 
/////////////////////////////////////////////////////////////////////////////////////////////////////////

// Cuda Libs: ( Device Side ):
#include <cuda_runtime.h>
#include <device_launch_parameters.h>

// Std C++ Libs:
#include <iostream>
#include <iomanip>
///////////





/////////////////////////////////////////////////////////////////////////////////////////////////////////
// TEST AREA:
// -- NAMESPACE:
/////////////////////////////////////////////////////////////////////////////////////////////////////////
using namespace std;
///////////





/////////////////////////////////////////////////////////////////////////////////////////////////////////
// TEST AREA:
// -- START POINT:
/////////////////////////////////////////////////////////////////////////////////////////////////////////
int main() {

    // Set cuda Device:
    if (cudaSetDevice(0) != cudaSuccess)
        cout << "ERROR: cudaSetDevice(***)" << endl;

    // Array:
    unsigned int size = 1000;
    double * d_ptr = nullptr;

    // Allocate unified memory:
    if (cudaMallocManaged(&d_ptr, size * sizeof(double), cudaMemAttachGlobal) != cudaSuccess)
        cout << "ERROR: cudaMallocManaged(***)" << endl;

    if (cudaDeviceSynchronize() != cudaSuccess)
        cout << "ERROR: cudaDeviceSynchronize(***)" << endl;

    // Prefetch:
    if(cudaMemPrefetchAsync(d_ptr, size * sizeof(double), 0) != cudaSuccess)
        cout << "ERROR: cudaMemPrefetchAsync(***)" << endl;

    // Exit:
    getchar();
}
///////////
einpoklum
  • 118,144
  • 57
  • 340
  • 684
PatrykB
  • 1,579
  • 1
  • 15
  • 24
  • 3
    the documentation says the device must have the cudaDevAttrConcurrentManagedAccess attribute be non-zero. Have you checked that? – talonmies Apr 15 '17 at 19:26
  • Well I have now and it says that I don't have a supoort for this: `(cudaDeviceProp)devProp.concurrentManagedAccess == 0`. But from what I understand this is a feature that was introduced for `compute capability == 6.0` (Pascal) and `cudaMemPrefetchAsync(***)` was first introduced for devices with `compute capability == 3.0` (Kepler). Therefore I still should be able to use it even though i am just `5.0` – PatrykB Apr 15 '17 at 19:37
  • @talonmies Could you be so kind and point me to the documentation page where you find it? – PatrykB Apr 15 '17 at 19:40
  • 1
    http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge8dc9199943d421bc8bc7f473df12e42 – talonmies Apr 15 '17 at 19:59
  • 1
    Thank you. You were right, I was wrong. My GPU does not support this feature. I can still use unified memory but without `cudaMemPrefetchAsync(***)`. – PatrykB Apr 15 '17 at 20:03
  • 1
    please add a short answer to your question – talonmies Apr 15 '17 at 20:04
  • 2
    This api call is for pascal GPUs. You don't have a pascal GPU, and the call would be redundant anyway because [pre-pascal UM will automatically migrate managed data to the GPU](http://stackoverflow.com/questions/39782746/why-is-nvidia-pascal-gpus-slow-on-running-cuda-kernels-when-using-cudamallocmana/40011988#40011988) at kernel launch – Robert Crovella Apr 15 '17 at 20:19

1 Answers1

7

Thank to talonmies I have realized that my GPU does not support prefetch feature. In order to be able to use cudaMemPrefetchAsync(***) gpu must have non-zero value in (cudaDeviceProp)deviceProp.concurrentManagedAccess.

See more here.

Community
  • 1
  • 1
PatrykB
  • 1,579
  • 1
  • 15
  • 24
  • 1
    Please remember to come back in a couple of days and accept this answer so your question falls off the unanswered list for the CUDA tag – talonmies Apr 16 '17 at 10:48