Questions tagged [nvidia-smi]

43 questions
0
votes
0 answers

Yarn Distributed-shell + GPU not showing nvidia-smi on output

I have a hadoop/yarn multi-node cluster on Ubuntu 22.04 and I have added GPU resources to the cluster following the hadoop instructions here: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html When I ran the command,…
0
votes
0 answers

Why my two gpus have different gpu memories?

I have installed two RTX A6000 gpu cards on my computer (Ubuntu 20.04). When I use the 'nvidia-smi' command, the output is bellow: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver…
wangwei
  • 45
  • 1
  • 1
  • 7
0
votes
0 answers

PyTorch DDP (with Join Context Manager) consuming more power for uneven data distribution

I am using a 2 node distributed setup(each having a single GPU respectively) to train a Neural Network (NN). I utilize PyTorch Data Distributed Parallel with Join Context Manager to achieve this. I am measuring power consumption varying data…
0
votes
0 answers

How to get the topology of GPU devices?

I want to write a function that returns the topology of the underlying GPU devices as a graph. I want the connections to indicate where data transfer can occur, and the weights to be the throughput capacity of these connections. I know that…
0
votes
0 answers

nvidia-smi version mismatch error when I try nvidia-smi

when I try nvidia-smi I am getting this error: Failed to initialize NVML: DRiver/library version mismatch But when I try nvcc --version, getting this output: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on…
0
votes
1 answer

How can I check if my GPU is being used or activate it for use?

Geforce 4080 problem Hello Stack Overflow community, I am currently working on a project that requires the use of a GPU, and I am not sure whether it is being used or not. Could someone please guide me on how to check if my GPU is being used or…
0
votes
0 answers

GCP Container Optimized OS Always Requires GPU Driver download and installation, despite caching and creating a custom image post-install

I'm using the Container Optimized OS to run an application that takes advantage of GPUs. I have a separate system that creates VMs to run this application on-demand (to minimize cost) and I've been trying to reduce the time to get my application…
Ethan
  • 1,206
  • 3
  • 21
  • 39
0
votes
2 answers

Query GPU memory usage and/or user by PID

I have a list of PIDs of processes running on different GPUs. I want to get the used GPU memory of each process based on its PID. nvidia-smi yields the information I want; however, I don't know how to grep it, as the output is sophisticated. I have…
JoJolyne
  • 45
  • 2
  • 5
0
votes
0 answers

nvidia-smi first detects card only with priviledged users (root / sudo)

I have an eGpu connected to a laptop with a mobile graphics card on a Ubuntu-based Linux system (Pop!_OS). My eGpu graphics card is not detected when running the $ nvidia-smi command as a regular user. However, my internal graphics card is…
Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33
0
votes
0 answers

nvidia-smi does not work, it keeps showing static but wrong information

On my machine Ubuntu 20.04.4 LTS, the command "nvidia-smi" suddenly does not work, it keeps showing static historical information. Some programs using GPU have stopped, but when running "nvidia-smi", we can still see the GPU usage. However, when we…
Nixon Jin
  • 1
  • 1
0
votes
0 answers

why nvidia-smi indicates used gpu memory is less than malloced?

I have malloc 12G gpu memory like this, but when I use nvidia-smi to check the gpu memory usage, it is only 4G. I couldn't understand. size_t size = 6U * 1024 * 1024 * 1024 / 4; int *devSrc; int *devDest; cudaMalloc((void**)&devSrc,…
0
votes
0 answers

how to interpret memory.used in nvidia-smi for pytorch in order to estimate minimum GPU requirements

I'm trying to figure out what is the minimum GPU requirement for my application. Using nvidia-smi as described here in Colab gives me a maximum value for memory.used of 4910MiB. So I presume a 4GB GPU is not enough, correct ? Also on this.. after…
rok
  • 2,574
  • 3
  • 23
  • 44
0
votes
0 answers

How to compare compute power of NVIDIA GPU in my desktop with Google Colab, and what is its different value?

My PC !nvidia-smi result Mon Nov 7 19:55:40 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 …
Vy Do
  • 46,709
  • 59
  • 215
  • 313
0
votes
0 answers

Puzzled by OOM Error on GPU when using 15595MiB / 16125MiB

I am using Tensorflow 2.X. My GPU memory is 16125MiB, but my model requires 15595MiB, according to nvidia-smi With this total usage, I get an OOM after some time, even when setting the minimum batch size. I also tried the following, but as soon as…
Phys
  • 508
  • 3
  • 11
  • 22
0
votes
0 answers

Docker Container non-root user Failed to initialize NVML: Insufficient Permissions

Outside of the container, my user (UID=1000) is able to use nvidia-smi. However, inside the docker container, the non-root user (same UID of 1000) is unable to use nvidia-smi, running into Failed to initialize NVML: Insufficient Permissions.…