-1

I'm trying to run the Whisper zoo model from DJL examples in GPU.

In the first run, I got the error that two devices were found - Cuda and CPU.

As I understood that this error occurs due to just the model being in the GPU and not the input, as a fix, I created two classes extending WhisperTranslator and WhisperTranslatorFactory, namely WhisperGPUTranslator and WhisperGPUTranslatorFactory. These classes were made so as to do NDArray.to(gpu(0)) on the processed input image-data.

This worked to help me start the application. However as soon as the inference is made, I get this error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f971f918000, pid=22329, tid=22431
#
# JRE version: OpenJDK Runtime Environment (11.0.20+8) (build 11.0.20+8-post-Ubuntu-1ubuntu123.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.20+8-post-Ubuntu-1ubuntu123.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [0.23.0-libdjl_torch.so+0x118000]  c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::operator->() const+0xc

Tried changing the versions from DJL and DJL-Pytorch-Engine version from 23.0 to 21.1 (below which the example code doesn't compile). Got the exact same error every time.

Hoping someone here might have the insight on how to fix this. Please help.

talonmies
  • 70,661
  • 34
  • 192
  • 269

1 Answers1

0

Figured it out, posting here just in case others too encounter this.

In the processInput function of the WhisperTranslator, change the line below:

NDArray placeholder = ctx.getNDManager().create("").toDevice(device, true);

with

NDArray placeholder = ctx.getNDManager().create(new float[0]).toDevice(device, true);

Of course the toDevice(gpu(0), copy=true) has to added too, in your extension of the WhisperTranslator class. (which for me is WhisperGPUTranslator, as described in the question)