Questions tagged [nvidia-smi]
43 questions
0
votes
0 answers
Yarn Distributed-shell + GPU not showing nvidia-smi on output
I have a hadoop/yarn multi-node cluster on Ubuntu 22.04 and I have added GPU resources to the cluster following the hadoop instructions here: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html
When I ran the command,…

khmx
- 1
0
votes
0 answers
Why my two gpus have different gpu memories?
I have installed two RTX A6000 gpu cards on my computer (Ubuntu 20.04). When I use the 'nvidia-smi' command, the output is bellow:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver…

wangwei
- 45
- 1
- 1
- 7
0
votes
0 answers
PyTorch DDP (with Join Context Manager) consuming more power for uneven data distribution
I am using a 2 node distributed setup(each having a single GPU respectively) to train a Neural Network (NN). I utilize PyTorch Data Distributed Parallel with Join Context Manager to achieve this. I am measuring power consumption varying data…
0
votes
0 answers
How to get the topology of GPU devices?
I want to write a function that returns the topology of the underlying GPU devices as a graph. I want the connections to indicate where data transfer can occur, and the weights to be the throughput capacity of these connections.
I know that…
0
votes
0 answers
nvidia-smi version mismatch error when I try nvidia-smi
when I try nvidia-smi I am getting this error:
Failed to initialize NVML: DRiver/library version mismatch
But when I try nvcc --version, getting this output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on…

Jasurbek
- 1
0
votes
1 answer
How can I check if my GPU is being used or activate it for use?
Geforce 4080 problem
Hello Stack Overflow community,
I am currently working on a project that requires the use of a GPU, and I am not sure whether it is being used or not. Could someone please guide me on how to check if my GPU is being used or…

달이민
- 1
- 1
0
votes
0 answers
GCP Container Optimized OS Always Requires GPU Driver download and installation, despite caching and creating a custom image post-install
I'm using the Container Optimized OS to run an application that takes advantage of GPUs. I have a separate system that creates VMs to run this application on-demand (to minimize cost) and I've been trying to reduce the time to get my application…

Ethan
- 1,206
- 3
- 21
- 39
0
votes
2 answers
Query GPU memory usage and/or user by PID
I have a list of PIDs of processes running on different GPUs. I want to get the used GPU memory of each process based on its PID. nvidia-smi yields the information I want; however, I don't know how to grep it, as the output is sophisticated. I have…

JoJolyne
- 45
- 2
- 5
0
votes
0 answers
nvidia-smi first detects card only with priviledged users (root / sudo)
I have an eGpu connected to a laptop with a mobile graphics card on a Ubuntu-based Linux system (Pop!_OS). My eGpu graphics card is not detected when running the $ nvidia-smi command as a regular user. However, my internal graphics card is…

Adrien Pacifico
- 1,649
- 1
- 15
- 33
0
votes
0 answers
nvidia-smi does not work, it keeps showing static but wrong information
On my machine Ubuntu 20.04.4 LTS, the command "nvidia-smi" suddenly does not work, it keeps showing static historical information.
Some programs using GPU have stopped, but when running "nvidia-smi", we can still see the GPU usage. However, when we…

Nixon Jin
- 1
- 1
0
votes
0 answers
why nvidia-smi indicates used gpu memory is less than malloced?
I have malloc 12G gpu memory like this, but when I use nvidia-smi to check the gpu memory usage, it is only 4G. I couldn't understand.
size_t size = 6U * 1024 * 1024 * 1024 / 4;
int *devSrc;
int *devDest;
cudaMalloc((void**)&devSrc,…
0
votes
0 answers
how to interpret memory.used in nvidia-smi for pytorch in order to estimate minimum GPU requirements
I'm trying to figure out what is the minimum GPU requirement for my application. Using nvidia-smi as described here in Colab gives me a maximum value for memory.used of 4910MiB. So I presume a 4GB GPU is not enough, correct ?
Also on this.. after…

rok
- 2,574
- 3
- 23
- 44
0
votes
0 answers
How to compare compute power of NVIDIA GPU in my desktop with Google Colab, and what is its different value?
My PC
!nvidia-smi
result
Mon Nov 7 19:55:40 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 …

Vy Do
- 46,709
- 59
- 215
- 313
0
votes
0 answers
Puzzled by OOM Error on GPU when using 15595MiB / 16125MiB
I am using Tensorflow 2.X.
My GPU memory is 16125MiB, but my model requires 15595MiB, according to nvidia-smi
With this total usage, I get an OOM after some time, even when setting the minimum batch size.
I also tried the following, but as soon as…

Phys
- 508
- 3
- 11
- 22
0
votes
0 answers
Docker Container non-root user Failed to initialize NVML: Insufficient Permissions
Outside of the container, my user (UID=1000) is able to use nvidia-smi. However, inside the docker container, the non-root user (same UID of 1000) is unable to use nvidia-smi, running into Failed to initialize NVML: Insufficient Permissions.…

Hyphen Interpause
- 13
- 2