Actually is converting a Pytorch Cuda project (https://github.com/suno-ai/bark) with DirectML for use my AMD GPU RX6700xt, i am having the problem RuntimeError: Cannot set version_counter for inference tensor
. I've Tried write to the developer but he saids that dont have experience in AMD.
I've changed all the .to(device)
to .to(dml)
of generation.py according to gpu-pytorch-windows Docs the files modifies are, generation.py
in bark
folder and build\lib\bark\
respectly. When I try run the project. I seen that the GPU started correctly but then i get the next error.
I really appreciate all the help you've given me so far. I was hoping you could help me out again. I've been reading a lot and trying different things, but I can't find much information on this error I'm getting. I don't know what to do or if I'm doing something wrong.
From here: https://github.com/suno-ai/bark/issues/271
python .\run.py
No GPU being used. Careful, inference might be very slow!
0%| | 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
audio_array = generate_audio(text_prompt)
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 107, in generate_audio
semantic_tokens = text_to_semantic(
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 25, in text_to_semantic
x_semantic = generate_text_semantic(
File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 460, in generate_text_semantic
logits, kv_cache = model(
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 208, in forward
x, kv = block(x, past_kv=past_layer_kv, use_cache=use_cache)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 121, in forward
attn_output, prev_kvs = self.attn(self.ln_1(x), past_kv=past_kv, use_cache=use_cache)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 50, in forward
q, k ,v = self.c_attn(x).split(self.n_embd, dim=2)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Cannot set version_counter for inference tensor
0%| | 0/100 [00:00<?, ?it/s]
I am in Python 3.9.16
Could you try help me with this?
Other thing: I've read about with torch-mlir is possible use AMD card, but not stay sure if on windows, I try it but i am not sure if need something more or some DirectMl Special I try install torch-mlir and works in the project but it only use CPU not GPU, i am not sure how configure for using the GPU
UPDATE2:
when try set mode=False in interference
inference_mode()
replaced was by inference_mode(mode=False)
I get this error: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications
No GPU being used. Careful, inference might be very slow!
0%| | 0/100 [00:00<?, ?it/s]C:\Users\NoeXVanitasXJunk\bark\bark\model.py:80: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
y = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=self.dropout, is_causal=is_causal)
100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:32<00:00, 3.04it/s]
0%| | 0/31 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
audio_array = generate_audio(text_prompt)
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 113, in generate_audio
out = semantic_to_waveform(
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 54, in semantic_to_waveform
coarse_tokens = generate_coarse(
File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 633, in generate_coarse
x_in = torch.hstack(
RuntimeError