How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?

Question

This is on windows 10 computer with no monitor attached to the Nvidia card. I've included output from nvida-smi showing > 5.04G was available.

Here is the tensorflow code asking it to allocate just slightly more than I had seen previously: (I want this to be as close as possible to memory fraction=1.0)

config = tf.ConfigProto()
#config.gpu_options.allow_growth=True
config.gpu_options.per_process_gpu_memory_fraction=0.84
config.log_device_placement=True
sess = tf.Session(config=config)

Just before running the above line in a jupyter notebook I ran nvida-smi:

    +-----------------------------------------------------------------------------+
| NVIDIA-SMI 376.51                 Driver Version: 376.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106... WDDM  | 0000:01:00.0     Off |                  N/A |
|  0%   27C    P8     5W / 120W |     43MiB /  6144MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Output from TF after it successfully allocates 5.01GB, shows "failed to allocate 5.04G (5411658752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY" (you need to scroll to the right to see it below)

2017-12-17 03:53:13.959871: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 5.01GiB
2017-12-17 03:53:13.960006: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2017-12-17 03:53:13.961152: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_driver.cc:936] failed to allocate 5.04G (5411658752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
2017-12-17 03:53:14.151073: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1

My best guess is some policy in an Nvidia user level dll is preventing use of all of the memory (perhaps to allow for attaching a monitor?)

If that theory is correct I'm looking for any user accessible knob to turn that off on windows 10. If I'm on the wrong track any help to point in the right direction is appreciated.

Edit #1:

I realized I did not include this bit of research: The following code in tensorflow indicates stream_exec is 'telling' TensorFlow that only 5.01GB is free. This is the primary reason for my current theory that some Nvidia component is preventing the allocation. (However I could be misunderstanding what component implements the instantiated stream_exec.)

auto stream_exec = executor.ValueOrDie();
int64 free_bytes;
int64 total_bytes;
if (!stream_exec->DeviceMemoryUsage(&free_bytes, &total_bytes)) {
  // Logs internally on failure.
  free_bytes = 0;
  total_bytes = 0;
}
const auto& description = stream_exec->GetDeviceDescription();
int cc_major;
int cc_minor;
if (!description.cuda_compute_capability(&cc_major, &cc_minor)) {
  // Logs internally on failure.
  cc_major = 0;
  cc_minor = 0;
}
LOG(INFO) << "Found device " << i << " with properties: "
          << "\nname: " << description.name() << " major: " << cc_major
          << " minor: " << cc_minor
          << " memoryClockRate(GHz): " << description.clock_rate_ghz()
          << "\npciBusID: " << description.pci_bus_id() << "\ntotalMemory: "
          << strings::HumanReadableNumBytes(total_bytes)
          << " freeMemory: " << strings::HumanReadableNumBytes(free_bytes);
}

Edit #2:

The thread below indicates Windows 10 is preventing full use of VRAM pervasively across secondary video cards used for compute by grabbing a % of the VRAM: https://social.technet.microsoft.com/Forums/windows/en-US/15b9654e-5da7-45b7-93de-e8b63faef064/windows-10-does-not-let-cuda-applications-to-use-all-vram-on-especially-secondary-graphics-cards?forum=win10itprohardware

This thread seems implausible given it would mean all windows 10 boxes are inherently worse than windows 7 for anything where VRAM on compute dedicated graphics cards could plausibly be the bottleneck.

Edit #3:

Update title to more clearly be a question. Feedback indicates this may be better as a bug to Microsoft or Nvidia. I am pursuing other avenues to get this addressed. However I don't want to assume this cannot be resolved directly.
Further experiments do indicate that the issue I am hitting is for the case of a large allocation from a single process. All of the VRAM can be used when another process comes into play.

Edit #4

The failure here is an allocation failure, and according to the NVIDIA-SMI above I have 43MiB in use (perhaps by the system?), but not by an identifiable process. The type of failure I'm seeing is of a monolithic single allocation. Under a typical allocation model that requires a continuous address space. So the pertinent question may be: What is causing that 43MiB to be used? Is that placed in the address space such that the 5.01 GB allocation is the max contiguous space available?

This issue has come up repeatedly in the NVIDIA developer forums. The allocation limit seems to be closer to 81% of GPU memory according to most observations, *across a variety of GPUs*. Best anybody can tell, this appears to be a "feature" of the driver model used by Windows 10, WDDM 2.0. Earlier Windows versions that use driver model WDDM 1.x do not seem to suffer from this issue, with the same GPUs. — njuffa, Dec 20 '17 at 07:08
So far it looks like it is a per-process limitation ... which from my point of view is a stunningly difficult global policy to defend. Ideally there would be a registry workaround. If not I'll release the bounty to someone who can point to official confirmation the policy is in place, intentional, and without workaround. — Steve Steiner, Dec 20 '17 at 15:32
I'm curious if there is a substantive reason for the downvotes on this question. Is it something I can fix? ... Poorly worded? Too confusing? The issue is failing to get traction so far .. Is it truly the wrong forum? Note I'm a Microsoft veteran and I expect this to be exactly the kind of thing they would want addressed rather than simply left to fester with rumor. When I was there we would look at these forums and answer exactly this kind of question. — Steve Steiner, Dec 20 '17 at 15:48
I am not the downvoter but did consider a close vote. This is not a forum, it is a Q&A site, and *there is no clear question here*. Your write-up reads more like a bug report, and you may want to file it as such with Microsoft (or, if your theory is that this has something to do with NVIDIA's drivers, file one with NVIDIA. Considering that NVIDIA would shoot themselves in the foot with such a "feature", they seem an unlikely culprit). — njuffa, Dec 20 '17 at 18:09
Apologies I was using 'forum' in the broad sense, I do realize this is a Q&A site. It has been years since I've actively used it. It's definitely fair to say the implied question is obscured: "How can I configure windows10 to allow 100% use of VRAM on a secondary GPU from a single process?" While it may be this is a bug rather than a misconfiguration on my part, I don't see how that bar can apply before the answer is known. (Also I am pursuing multiple avenues ... presumably closing this would prevent me from posting the answer if it is determined by another channel.) — Steve Steiner, Dec 20 '17 at 18:21

score 5 · Answer 1 · edited Dec 27 '17 at 02:21

5

It is clearly not possible for now, as Windows Display Driver Model 2.x has a limit defined, and no process can override it {Legally}.

Assuming you have played with "Prefer Maximum Performance Setting" with that you can push it to at max 92% with Power Supply.

This would help you in detail, if you like to know more about the WDDM 2.x:

https://learn.microsoft.com/en-us/windows-hardware/drivers/display/what-s-new-for-windows-threshold-display-drivers--wddm-2-0-

edited Dec 27 '17 at 02:21

Steve Steiner

5,299
4
32
43

answered Dec 26 '17 at 09:37

N.K

2,220
1
14
44

I actually have not attempted that yet! Using the Nvidia control panel requires attaching a monitor first. Given I had reinstalled windows10 from scratch to ensure the GPU was never attached to a monitor, undoing that hard won property was not the first thing on my list to try. If this can get it to 92% that is a nice boost. – Steve Steiner Dec 27 '17 at 01:21
Sadly the "Prefer Maximum Performance Setting" appears to not affect the issue in any way. I attached a monitor changed the setting and reproduced the original problem. I rebooted reproduced the original problem, removed the monitor again isolating the GPU and rebooted again, and still reproduced the original problem. (By original problem I mean force a failure to allocate 5.04 GB, while demonstrating that 5.01 GB is possible). – Steve Steiner Dec 27 '17 at 02:06
Have you tried using any benchmark test application..? Also if you can re attach the monitor and set widows power management settings and set the performance to highest available setting. – N.K Dec 27 '17 at 07:14
This answer implies there is a "non-legal" way to change this WDDM? Please advise how to do this! – user3496060 Aug 24 '19 at 16:03

score 3 · Answer 2 · answered Dec 21 '17 at 19:29

I believe, for cards that support the TCC driver, it is a solvable problem. Sadly my 1060 GTX does not appear to support that.

I would need such a card to verify. Absent someone producing a solution that works on a GTX 1060, I'd definitely release the bounty to someone capable of demonstrating a single process using 100% of VRAM on windows 10 with the TCC driver.

How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?

Edit #1:

Edit #2:

Edit #3:

Edit #4

2 Answers2

Linked