I'm trying to execute the following code using tensorflow
, Hugginface's transformer
and openai/whisper-base
model
import tensorflow as tf
import transformers
# Load the model and tokenizer
model = transformers.TFWhisperModel.from_pretrained('openai/whisper-base')
tokenizer = transformers.WhisperTokenizer.from_pretrained('openai/whisper-base')
# Read the audio file and convert it to a tensor
audio_file = "data/preamble.wav"
with open(audio_file, 'rb') as f:
audio = f.read()
input_ids = tf.constant(tokenizer.encode(audio, return_tensors='tf'))
# Transcribe the audio
output = model(input_ids)[0]
transcription = tokenizer.decode(output, skip_special_tokens=True)
with open("something.txt", "w") as f:
f.write(transcription)
I'm getting this huge output error, too big to copy and paste here, below is an error snippet. The entire message consists of the same syntax except for the last line, which I've pasted below. The add picture is the top of the error message that I had to screenshot before it disappears.
Top of Error message picture
The 1st output to terminal after running script
Bottom of Error Snippet
c\xff\x0c\x00\xeb\xff\xb3\xff\xc5\xff\x0f\x00\xde\xff\x16\x00B\x00\x0e\x00\xfd\xff$\x000\x00\xff\x
ff\xe7\xff<\x00\xfb\xff\n\x00/\x008\x00\x06\x00\x17\x00\x1d\x00\xde\xff\xf2\xff\xec\xff\xff\xff\x0
f\x00\x1b\x008\x00\x1d\x003\x00%\x00#\x00\r\x00\x16\x00\x1d\x00\x19\x00\xf7\xff\x14\x00\xff\xff\xc
c\xff\x06\x00\xf1\xff\x11\x00\xf0\xff*\x00P\x00\xe7\xffH\x00\t\x00\xd0\xff\xd0\xff\xee\xff\xf6\xff
\xc6\xff\xe4\xff\xce\xff' is not valid. Should be a string, a list/tuple of strings or a list/tuple
of integers.
The last line is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
is my only clue as to my next step.
I cannot scroll to the top to find where in my code is throwing the error. I'm new to machine learning and I don't know what I'm seeing. Any help is appreciated.
Thank you in advance!!!
I tried a try execpt
block around output
and transcription
with no change, same output message
I've tried:
input_ids = str(tf.constant(tokenizer.encode(audio, return_tensors='tf')))
input_ids = []
input_ids = input_ids.append(int(tf.constant(tokenizer.encode(audio, return_tensors='tf'))))
output = model(str(input_ids))[0]
No change to the output