0

I'm trying to execute the following code using tensorflow, Hugginface's transformer and openai/whisper-base model

import tensorflow as tf
import transformers

# Load the model and tokenizer
model = transformers.TFWhisperModel.from_pretrained('openai/whisper-base')
tokenizer = transformers.WhisperTokenizer.from_pretrained('openai/whisper-base')

# Read the audio file and convert it to a tensor
audio_file = "data/preamble.wav"
with open(audio_file, 'rb') as f:
    audio = f.read()
input_ids = tf.constant(tokenizer.encode(audio, return_tensors='tf'))

# Transcribe the audio
output = model(input_ids)[0]
transcription = tokenizer.decode(output, skip_special_tokens=True)

with open("something.txt", "w") as f:
    f.write(transcription)

I'm getting this huge output error, too big to copy and paste here, below is an error snippet. The entire message consists of the same syntax except for the last line, which I've pasted below. The add picture is the top of the error message that I had to screenshot before it disappears.

Top of Error message picture

The 1st output to terminal after running script

Bottom of Error Snippet

c\xff\x0c\x00\xeb\xff\xb3\xff\xc5\xff\x0f\x00\xde\xff\x16\x00B\x00\x0e\x00\xfd\xff$\x000\x00\xff\x
ff\xe7\xff<\x00\xfb\xff\n\x00/\x008\x00\x06\x00\x17\x00\x1d\x00\xde\xff\xf2\xff\xec\xff\xff\xff\x0
f\x00\x1b\x008\x00\x1d\x003\x00%\x00#\x00\r\x00\x16\x00\x1d\x00\x19\x00\xf7\xff\x14\x00\xff\xff\xc
c\xff\x06\x00\xf1\xff\x11\x00\xf0\xff*\x00P\x00\xe7\xffH\x00\t\x00\xd0\xff\xd0\xff\xee\xff\xf6\xff
\xc6\xff\xe4\xff\xce\xff' is not valid. Should be a string, a list/tuple of strings or a list/tuple
 of integers.

The last line is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers. is my only clue as to my next step.

I cannot scroll to the top to find where in my code is throwing the error. I'm new to machine learning and I don't know what I'm seeing. Any help is appreciated.

Thank you in advance!!!

I tried a try execpt block around output and transcription with no change, same output message

I've tried:

input_ids = str(tf.constant(tokenizer.encode(audio, return_tensors='tf')))
input_ids = []
input_ids = input_ids.append(int(tf.constant(tokenizer.encode(audio, return_tensors='tf'))))
output = model(str(input_ids))[0]

No change to the output

  • Opening a wav file in binary like that does not make any sense, you are passing raw bytes to a model, you need to use a proper library to open wav files. – Dr. Snoopy Jan 08 '23 at 16:16
  • Is there a list of libraries that can process audio files? I could not find example code that goes from a local file to the format needed. – Daniel Luca CleanUnicorn Jan 16 '23 at 13:08

0 Answers0