Highest Voted 'nvidia' Questions - Server Fault Stack Exchange

1

vote

1 answer

Nvidia Pascal architecture: DMA Size / maximum amount of host system RAM?

We are planning to build a pair of multi-GPU Linux servers for machine learning and data science tasks. Per our requirements, we need to put a lot of RAM in these machines; we're planning on 24x 64GiB LRDIMMs for a total of 1.5TiB. For GPUs, we were…

asked Jul 12 '16 at 10:48

mvoelske

111
3

1

vote

2 answers

Ganglia's GPU Nvidia module: do we need to patch the ganglia-webfrontend?

I am trying to add the GPU Nvidia module in ganglia (/ganglia/gmond_python_modules/gpu/nvidia/). Do we need to apply the ganglia_web.patch patch? If I do not apply the patch, I don't see any GPU metrics when I go to http://localhost/ganglia/ If I…

monitoring ganglia nvidia

asked Apr 21 '16 at 03:56

Franck Dernoncourt

1,022
2
14
32

1

vote

0 answers

nvidia driver displaying odd bios,uuid under Grid K2

I have a number of servers that have GRID K2 nvidia Tesla cards in. Initially these were working fine. But I recently upgraded the kernel driver and have found a problem where CUDA based apps were no longer detecting GPU's being present. On closer…

linux coreos nvidia

asked Jun 15 '15 at 03:34

hookenz

14,472
23
88
143

1

vote

0 answers

NVIDIA Grid / Gaming drivers licensing issues AWS EC2

I'm following https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/install-nvidia-driver.html#nvidia-gaming-driver in order to install NVIDIA Gaming drivers to unlock higher resolutions on AWS EC2 on a g4dn.xlarge instance. Everything is going…

amazon-ec2 windows-server-2019 nvidia

asked Sep 01 '23 at 02:57

Tommy B.

1,413
2
14
14

1

vote

0 answers

Hyper-V GPU Passthrough with NVidia A100 no display

I am currently trying to get some NVidia A100 GPUs to work on our Hyper-V Hypervisor. I managed to setup the GPU Passthrough to a VM but the problem is I don't get a video display. I assume the problem is that the NVidia A100s don't have a…

virtual-machines hyper-v windows-server-2019 nvidia gpu

asked Aug 23 '23 at 07:14

C0dR

161
7

1

vote

1 answer

Does a defunct process still allocate resources in the system?

I have a production machine (Ubuntu 18.04) that runs processes in GPU using Nvidia. A certain process has allocated memory and is now defunct, leaving the GPUs basically unusable. ps -o ppid= -p Returns one which means that PID=1 is parent of my…

linux process nvidia

asked Apr 18 '23 at 15:33

Marco Montevechi Filho

13
3

1

vote

0 answers

Linux: cuda (pytorch) does not allocate available vram

I am trying out pixray/clipit but cuda fails to allocate the remaining 1GiB of my graphics card. My graphics card is "Nvidia GTX 1660 super" which has the same amount of RAM as the "Nvidia GTX 1660 Ti" which belongs to somebody I know - and it…

linux memory nvidia outofmemoryerror cuda

asked Aug 27 '22 at 07:30

france1

23
9

1

vote

1 answer

Ubuntu server 20.04 LTS - Installing nvidia & cuda installs gnome as well

I have a GPU server which requires cuda for example for machine learning tasks. unfortunately, as soon as I install the NVIDIA drivers and cuda, apparently a variant of gnome is installed as well. This gnome variant can almost do nothing, the shell…

ubuntu drivers gnome nvidia cuda

asked Sep 10 '21 at 17:19

Julian Bechtold

123
6

1

vote

1 answer

GPU server freezes during GPU idling

We have a new Supermicro Server AS-4124GS-TNR equipped with eight NVIDIA RTX A6000. The OS is Ubuntu 20.04.2, the NVIDIA driver version is 460.73.01 (no Nouveau driver used), the CUDA Version is 11.2. We ran a few long-lasting tests on the GPUs and…

ubuntu server-crashes nvidia freeze

asked Jul 14 '21 at 07:39

user776206

13
4

1

vote

1 answer

Is the Pod Resources API disabled on Google Kubernetes Engine?

Problem Summary: We're using DCGM Exporter to collect metrics about GPU workloads. When deployed on GKE, the exporter does not return GPU information about other pods or containers (when it's expected to return that information). This exporter runs…

monitoring kubernetes google-kubernetes-engine nvidia

asked May 05 '21 at 17:31

Ash

121
5

1

vote

1 answer

slurm nvidia-docker ignores CUDA_VISIBLE_DEVICES

I have a problem running nvidia-docker containers on a slurm cluster. When inside the container all gpus are visible so basically it ignores the CUDA_VISIBLE_DEVICES set env by slurm. Outside the container the visible gpus are correct. Is there a…

docker nvidia slurm

asked Mar 21 '21 at 18:26

JohnA.Zoidberg

13
3

1

vote

1 answer

GKE can't schedule newly created pods that demand GPU on newly added nodes with GPUs

When adding new pool nodes with GPU Google Kubernetes Engine can't schedule newly created pods that demand GPU on these new nodes, should be automatic but not for GPU resources I guess, new pods stays in 'pending' state forever, how to fix…

google-cloud-platform kubernetes google-kubernetes-engine graphics-processing-unit nvidia

asked Jul 17 '20 at 08:19

Elras

21
4

1

vote

1 answer

Misbehaving NVLINK with 2080 ti cards?

I am running into problems with nvlink'd RTX videocards, and I wonder if someone more experienced with this tech could kindly look at the output below and tell me if there is a problem? Using a pair of MSI 2080 ti cards and an RTX NVLINK bridge by…

linux networking nvidia gpu nvlink

asked May 08 '20 at 14:34

Eric M

113
5

0

votes

0 answers

How can I get kernel / early boot output over my NVIDIA GPU using CentOS 7?

I recently installed CentOS 7.7 with KDE on a machine with both onboard graphics and an NVIDIA GTX 1080 Ti. I got the proprietary NVIDIA drivers installed, but it was quite difficult as I couldn't see what was happening during boot up past a certain…

linux centos boot grub nvidia

asked Sep 26 '19 at 17:46

josePhoenix

183
2
8

0

votes

1 answer

PCI at NVIDA Tesla P 100 in shared pass through mode is disabled

I have successfully completed NVIDIA Tesla P100 Grid Setup on the vSphere Host Server with VMware ESXI 6.7.While trying to add PCI devices in the VM, the option to choose PCI devices is shown as in the “Add Other Hardware” setting in “Virtual…

vmware-esxi vmware-vsphere vmware-esx pci nvidia

asked Mar 14 '19 at 11:29

Sarath Zacharia

31
1
5

Questions tagged [nvidia]