On Whisper API, when I try to use a python script for transcribing audio files in bulk, I can't get the correct response_format ('srt' or 'vtt') work

Question

I'm using this code for connecting to Whisper API and transcribe in bulk all mp3 in a folder to both srt and vtt:

import requests
import os
import openai

folder_path = "/content/audios/"
def transcribe_and_save(file_path, format):
    url = 'https://api.openai.com/v1/audio/transcriptions'
    headers = {'Authorization': 'Bearer MyToken'}
    files = {'file': open(file_path, 'rb'), 
            'model': (None, 'whisper-1'),
            'response_format': format}
    response = requests.post(url, headers=headers, files=files)
    output_path = os.path.join(folder_path, os.path.splitext(filename)[0] + '.' + format)
    with open(output_path, 'w') as f:
        f.write(response.content.decode('utf-8'))

for filename in os.listdir(folder_path):
    if filename.endswith('.mp3'):
        file_path = os.path.join(folder_path, filename)
        transcribe_and_save(file_path, 'srt')
        transcribe_and_save(file_path, 'vtt')
else:
    print('mp3s not found in folder')

When I use this code, I'm getting the following error:

"error": {
    "message": "1 validation error for Request\nbody -> response_format\n  value is not a valid enumeration member; permitted: 'json', 'text', 'vtt', 'srt', 'verbose_json' (type=type_error.enum; enum_values=[<ResponseFormat.JSON: 'json'>, <ResponseFormat.TEXT: 'text'>, <ResponseFormat.VTT: 'vtt'>, <ResponseFormat.SRT: 'srt'>, <ResponseFormat.VERBOSE_JSON: 'verbose_json'>])",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }

I've tried with different values, but either don't work or I'm only receiving the transcription as a object in plain text, but no srt or vtt. I'm expecting to get srt and vtt files in the same folder as where audios are

Thanks, Javi

score 3 · Answer 1 · answered Mar 07 '23 at 07:12

I've found the solution, the problem was in one of the parameters 'response_format': (None, output_format):

def transcribe_and_save(file_path, output_format):
    url = 'https://api.openai.com/v1/audio/transcriptions'
    headers = {'Authorization': 'Bearer myToken'}
    files = {'file': open(file_path, 'rb'),
             'model': (None, 'whisper-1'),
             'response_format': (None, output_format)}
    response = requests.post(url, headers=headers, files=files)
    output_path = os.path.join(folder_path, os.path.splitext(os.path.basename(file_path))[0] + '.' + output_format)
    with open(output_path, 'w') as f:
        f.write(response.content.decode('utf-8'))

for filename in os.listdir(folder_path):
    if filename.endswith('.mp3'):
        file_path = os.path.join(folder_path, filename)
        transcribe_and_save(file_path, 'srt')
        transcribe_and_save(file_path, 'vtt')
else:
    print('mp3s not found in folder')

you are a genius, and according to your format. when you want a english sub from a foreign language , just add `'language':(None,'en')` to the `files`. — QuantumBlindVimTaoist, Apr 28 '23 at 19:55

score 0 · Answer 2 · answered Mar 05 '23 at 20:33

0

I am not sure about the whisper api, but you seem to be using an already existing python function as a parameter name. Perhaps this could be a reason why it is not working, as the function format is being used when calling the endpoint instead of the parameter you passed in.

Try changing the parameter name to something other than format and change the value being used for response_format.

answered Mar 05 '23 at 20:33

Thomasssb1

36
4

Thanks, Thomasssb1! Yes, you're right. my mistake :). I changed that, but I'm still getting the same error. if I delete the response_format parameter: files = {'file': open(file_path, 'rb'), 'model': (None, 'whisper-1')} #'response_format': response_format} I'm not getting an error, but the object with the transcribed text: {"text":"This is Stella. She's eight."} But what I want to have are the srt and vtt format, not just the transcription. Thanks! – waghler Mar 06 '23 at 11:09

score 0 · Answer 3 · answered Mar 21 '23 at 15:54

Here's a working Solution for single files:

import requests
import os

OPENAI_API_KEY = "123xyzxyzxyzxyzxyzxyzxyzxyz"

token = f"Bearer {OPENAI_API_KEY}"

url = "https://api.openai.com/v1/audio/transcriptions"
model_name ="whisper-1"

headers ={
    "Authorization": token,
    "Content-Type": "multipart/form-data"
}

file_path ="1.mp3"
with open(file_path,"rb") as file:
    file_content = file.read()

payload = {
    "name": os.path.basename(file_path),
    "response_format": "json",
    "prompt": "transcribe this Chapter",
    "language": "de",
    "model": model_name
}

files = {
    "file": (os.path.basename(file_path), file_content, "audio/mp3")
}

response = requests.post(url, headers=headers, data=payload, files=files)


print(response.text)

On Whisper API, when I try to use a python script for transcribing audio files in bulk, I can't get the correct response_format ('srt' or 'vtt') work

3 Answers3