7

I have an NVIDIA RTX 2070 GPU and CUDA installed, I have WebGL support, but when I run the various TFJS examples, such as the Addition RNN Example or the Visualizing Training Example, I see my CPU usage go to 100% but the GPU (as metered via nvidia-smi) never gets used.

How can I troubleshoot this? I don't see any console messages about not finding the GPU. The TFJS docs are really vague about this, only saying that it uses the GPU if WebGL is supported and otherwise falls back to CPU if it can't find the WebGL. But again, WebGL is working. So...how to help it find my GPU?

Other related SO questions seem to be about tfjs-node-gpu, e.g., getting one's own tfjs-node-gpu installation working. This is not about that. I'm talking about running the main TFJS examples on the official TFJS pages from my browser.

Browser is the latest Chrome for Linux. Running Ubuntu 18.04.

EDIT: Since someone will ask, chrome://gpu shows that hardware acceleration is enabled. The output log is rather long, but here's the top:

Graphics Feature Status
Canvas: Hardware accelerated
Flash: Hardware accelerated
Flash Stage3D: Hardware accelerated
Flash Stage3D Baseline profile: Hardware accelerated
Compositing: Hardware accelerated
Multiple Raster Threads: Enabled
Out-of-process Rasterization: Disabled
OpenGL: Enabled
Hardware Protected Video Decode: Unavailable
Rasterization: Software only. Hardware acceleration disabled
Skia Renderer: Enabled
Video Decode: Unavailable
Vulkan: Disabled
WebGL: Hardware accelerated
WebGL2: Hardware accelerated
sh37211
  • 1,411
  • 1
  • 17
  • 39
  • Could you please try the command `console.log(tf.getBackend ())` to see which backend is used ? I think that you need to set your backend with `tf.setBackend('webgl')` – edkeveked May 16 '20 at 22:19
  • When I open the developer console for the Addition RNN page and type "console.log(tf.getBackend ())", I get this error message: "Uncaught ReferenceError: tf is not defined VM288.1 at :1:13" – sh37211 May 16 '20 at 23:39
  • I thought you were talking about your own script not using the gpu – edkeveked May 17 '20 at 07:05

1 Answers1

9

Got it essentially solved. I found this older post, that one needs to check whether WebGL is using the "real" GPU or just some Intel-integrated-graphics offshoot of the CPU.

To do this, go to https://alteredqualia.com/tmp/webgl-maxparams-test/ and scroll down to the very bottom and look at the Unmasked Renderer and Unmasked Vendor tag.

In my case, these were showing Intel, not my NVIDIA GPU.

My System76 laptop has the capacity to run in "Hybrid Graphics" mode in which big computations are performed on the GPU but smaller things like GUI elements run on the integrated graphics. (This saves battery life.) But while some applications are able to take advantage of the GPU when in Hybrid Graphics mode -- I just ran a great Adversarial Latent AutoEncoder demo that maxed out my GPU while in Hybrid Graphics mode -- not all are. Chrome is one example of the latter, apparently.

To get WebGL to see my NVIDIA GPU, I needed to reboot my system in "full NVIDIA Graphics" mode.

After this reboot, some of the TFJS examples will use the GPU, such as the Visualizing Training example, which now trains almost instantly instead of taking a few minutes to train. But the Addition RNN example still only uses the CPU. This may be because of a missing backend declaration that @edkeveked pointed out.

sh37211
  • 1,411
  • 1
  • 17
  • 39
  • 3
    I didn't have to set my full system into NVIDIA mode. Instead, I went to "Graphics Settings" in the windows settings app, found the chrome executable, and set it to use the high performance graphics adapter. Then I restarted Chrome and it worked. – user3413723 Jan 02 '21 at 15:50