Questions tagged [nvidia]

For programming questions specifically related to Nvidia hardware. N.B. Questions about system configuration are usually off-topic here!

Nvidia is an American global technology company based in Santa Clara, California, best known for its graphics processors (GPUs).

More about Nvidia at http://en.wikipedia.org/wiki/Nvidia
Nvidia website at http://www.nvidia.com/content/global/global.php

3668 questions
12
votes
3 answers

what is the path for libcudart.so?

I'm trying to install Tensorflow GPU version and I'm stuck at this. I've installed nvidia-cuda-toolkit by running sudo apt install nvidia-cuda-toolkit and it downloaded fine. But i'm unable to locate this libcudart.so Please specify which gcc nvcc…
user6384481
12
votes
1 answer

How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?

Can I run non-MPI CUDA applications concurrently on NVIDIA Kepler GPUs with MPS? I'd like to do this because my applications cannot fully utilize the GPU, so I want them to co-run together. Is there any code example to do this?
dalibocai
  • 2,289
  • 5
  • 29
  • 45
12
votes
2 answers

Matrix-vector multiplication in CUDA: benchmarking & performance

I'm updating my question with some new benchmarking results (I also reformulated the question to be more specific and I updated the code)... I implemented a kernel for matrix-vector multiplication in CUDA C following the CUDA C Programming Guide…
Pantelis Sopasakis
  • 1,902
  • 5
  • 26
  • 45
12
votes
2 answers

Way to get floating-point special values in CUDA?

Is there any device functions in CUDA to obtain IEEE 754 special values like inf, NaN? I mean the stable way, not by some maths ops that could be optimized-out by compilers. I only manage to find a device function called nan() that must take some…
user0002128
  • 2,785
  • 2
  • 23
  • 40
12
votes
4 answers

Benchmarks comparing Intel Xeon Phi and Nvidia Tesla K20

To my surprise, I cannot find a comparison of these products using open source OpenCL benchmark suites, such as rodinia and SHOC. Such a comparison could be more interesting than comparisons of theoretical peak performance, or of performance in…
Matt
  • 569
  • 1
  • 4
  • 16
11
votes
5 answers

Failed to initialize NVML: Unknown Error in Docker after Few hours

I am having interesting and weird issue. When I start docker container with gpu it works fine and I see all the gpus in docker. However, few hours or few days later, I can't use gpus in docker. When I do nvidia-smi in docker machine. I see this…
Justin Song
  • 111
  • 1
  • 4
11
votes
1 answer

What is the difference between cuda.amp and model.half()?

According to https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/ We can use: with torch.cuda.amp.autocast(): loss = model(data) In order to casts operations to mixed precision. Another…
user3668129
  • 4,318
  • 6
  • 45
  • 87
11
votes
2 answers

Which GPU should I use on Google Cloud Platform (GCP)

Right now, I'm working on my master's thesis and I need to train a huge Transformer model on GCP. And the fastest way to train deep learning models is to use GPU. So, I was wondering which GPU should I use among the ones provided by GCP? The ones…
Anwarvic
  • 12,156
  • 4
  • 49
  • 69
11
votes
3 answers

tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error

I am trying to use GPU with Tensorflow. My Tensorflow version is 2.4.1 and I am using Cuda version 11.2. Here is the output of nvidia-smi. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.39 …
Ricky
  • 635
  • 2
  • 5
  • 20
11
votes
4 answers

How to download the cuDNN straight from nvidia website to my linux instance on GCP

I want to install tensorflow-gpu on my linux machine on google cloud platform. I am not using an deep learning vm gcp provide. So I installed anaconda on my linux instance and now i want to install tensorflow. I already installed nvidia drivers and…
Gayal Kuruppu
  • 1,261
  • 1
  • 17
  • 29
11
votes
1 answer

GPU RAM occupied but no PIDs

The nvidia-smi shows following indicating 3.77GB utilized on GPU0 but no processes are listed for GPU0: (base) ~/.../fast-autoaugment$ nvidia-smi Fri Dec 20 13:48:12 2019 …
Shital Shah
  • 63,284
  • 17
  • 238
  • 185
11
votes
1 answer

Why does vkGetPhysicalDeviceMemoryProperties return multiple identical memory types?

So, I'm gathering some info about my device in Vulkan during initialization and find a unique (or rather, quite similar) set of memory types returned by vkGetPhysicalDeviceMemoryProperties: Device Name: GeForce GTX 1060 3GB Device ID: 7170 Device…
Frzn Flms
  • 468
  • 1
  • 7
  • 18
11
votes
2 answers

How to fix low volatile GPU-Util with Tensorflow-GPU and Keras?

I have a 4 GPU machine on which I run Tensorflow (GPU) with Keras. Some of my classification problems take several hours to complete. nvidia-smi returns Volatile GPU-Util which never exceeds 25% on any of my 4 GPUs. How can I increase GPU Util%…
11
votes
3 answers

Error response from daemon: Get https://nvcr.io/v2/: unauthorized: authentication required

I begin to use NVIDIA GPU CLOUD Deep Learning platform. I try to pull in the console (Command Prompt): docker pull nvcr.io/nvidia/pytorch:17.10 and get the message: Error response from daemon: Get https://nvcr.io/v2/: unauthorized: authentication…
Roman
  • 19,236
  • 15
  • 93
  • 97
11
votes
2 answers

What's the meaning of the params x,y,z,w in function cudaCreateChannelDesc

cudaCreateChannelDesc(int x, int y, int z, int w, enum cudaChannelFormatKind f); Now i have an example code: cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat); I have no idea about why x=32,y=z=w=0. Could explain why the channel…
biaodiluer
  • 121
  • 1
  • 4