RuntimeError: Cannot set version_counter for inference - Trying DirectML in AI Project for AMD

Question

Actually is converting a Pytorch Cuda project (https://github.com/suno-ai/bark) with DirectML for use my AMD GPU RX6700xt, i am having the problem RuntimeError: Cannot set version_counter for inference tensor. I've Tried write to the developer but he saids that dont have experience in AMD.

I've changed all the .to(device) to .to(dml) of generation.py according to gpu-pytorch-windows Docs the files modifies are, generation.py in bark folder and build\lib\bark\ respectly. When I try run the project. I seen that the GPU started correctly but then i get the next error.

I really appreciate all the help you've given me so far. I was hoping you could help me out again. I've been reading a lot and trying different things, but I can't find much information on this error I'm getting. I don't know what to do or if I'm doing something wrong.

From here: https://github.com/suno-ai/bark/issues/271

python .\run.py
No GPU being used. Careful, inference might be very slow!
  0%|                                                                                                                                                | 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
  File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
    audio_array = generate_audio(text_prompt)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 107, in generate_audio
    semantic_tokens = text_to_semantic(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 25, in text_to_semantic
    x_semantic = generate_text_semantic(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 460, in generate_text_semantic
    logits, kv_cache = model(
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 208, in forward
    x, kv = block(x, past_kv=past_layer_kv, use_cache=use_cache)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 121, in forward
    attn_output, prev_kvs = self.attn(self.ln_1(x), past_kv=past_kv, use_cache=use_cache)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 50, in forward
    q, k ,v  = self.c_attn(x).split(self.n_embd, dim=2)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Cannot set version_counter for inference tensor
  0%|                                                                                                                                                | 0/100 [00:00<?, ?it/s]

I am in Python 3.9.16

Could you try help me with this?

Other thing: I've read about with torch-mlir is possible use AMD card, but not stay sure if on windows, I try it but i am not sure if need something more or some DirectMl Special I try install torch-mlir and works in the project but it only use CPU not GPU, i am not sure how configure for using the GPU

UPDATE2:

when try set mode=False in interference inference_mode() replaced was by inference_mode(mode=False)

I get this error: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications

No GPU being used. Careful, inference might be very slow!
  0%|                                                                                          | 0/100 [00:00<?, ?it/s]C:\Users\NoeXVanitasXJunk\bark\bark\model.py:80: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
  y = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=self.dropout, is_causal=is_causal)
100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:32<00:00,  3.04it/s]
  0%|                                                                                           | 0/31 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
    audio_array = generate_audio(text_prompt)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 113, in generate_audio
    out = semantic_to_waveform(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 54, in semantic_to_waveform
    coarse_tokens = generate_coarse(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 633, in generate_coarse
    x_in = torch.hstack(
RuntimeError

It seems that certain operations in your model are not currently supported on the DirectML backend, and the library is falling back to run those operations on the CPU. Please update the error message `RuntimeError` for clarity. — Anay, Jul 26 '23 at 05:18
[Hardware and Inference Speed - suno-ai/bark](https://github.com/suno-ai/bark#%EF%B8%8F-hardware-and-inference-speed) is `Bark has been tested and works on both CPU and GPU (pytorch 2.0+, CUDA 11.7 and CUDA 12.0).` and [PyTorch with DirectML - microsoft/DirectML](https://github.com/microsoft/directml#pytorch-with-directml) is `DirectML acceleration for Pytorch 1.13.0 is currently available for Public Preview.` Isn't the supported pytorch version number inconsistent? — kunif, Jul 27 '23 at 17:55

score 0 · Accepted Answer · answered Jul 28 '23 at 13:11

Actually this bug was resolved thanks the user JonathanFly the which has working in the port and support of bark for DirectML with AMD GPU's, Now it works in windows.

https://github.com/JonathanFly/bark/tree/bark_amd_directml_test#-bark-amd-install-test

Thank you guys!!!

RuntimeError: Cannot set version_counter for inference - Trying DirectML in AI Project for AMD

1 Answers1