2

I've been at this all night, I'm attempting to record myself on my iPhone via expo using expo-av (records speech via iPhone) and upload it to openai's transcriptions endpoint using whisper-1 model.

The file is saved as mp4, I convert it to a base64 string, I have confirmed the base64 content is infact mp4:

base64 to file converting tool

enter image description here

uploading and checking file tool

enter image description here

Here's the react-native code:

  const recordingOptions = {
    android: {
      extension: ".mp4",
      outputFormat: Audio.AndroidOutputFormat.MPEG_4,
      audioEncoder: Audio.AndroidAudioEncoder.AAC,
      sampleRate: 44100,
      numberOfChannels: 2,
      bitRate: 128000,
    },
    ios: {
      extension: ".mp4",
      // outputFormat: Audio.IOSOutputFormat.MPEG4AAC,
      audioQuality: Audio.IOSAudioQuality.HIGH,
      sampleRate: 44100,
      numberOfChannels: 2,
      bitRate: 128000,
    },
    web: {
      mimeType: "audio/mp4",
      bitsPerSecond: 128000 * 8,
    },
  };

actual implementation:

const recordingUri = recording.getURI();
      const recordingBase64 = await ExpoFileSystem.readAsStringAsync(
        recordingUri,
        {
          encoding: ExpoFileSystem.EncodingType.Base64,
        }
      );
      const languageCode = "en"; // English
      console.log(languageCode);
      console.log(recordingBase64)

      const buffer = Buffer.from(recordingBase64, "base64")
      const blob= new Blob([buffer], { type:'audio/mp4' })
      const file = new File([blob],'test.mp4', {type:'audio/mp4'})



      const formData = new FormData();
      formData.append('file',file);
      formData.append("model", "whisper-1");

      const apiUrl = "https://api.openai.com/v1/audio/transcriptions";

      const requestOptions = {
        method: "POST",
        headers: {
          Authorization: `Bearer ${OPENAI_API_KEY}`,
        },
        body: formData,
      };

      fetch(apiUrl, requestOptions)
        .then((response) => response.json())
        .then((data) => console.log(data))
        .catch((error) => console.log(error));

and every time the response is:

{"error": {"code": null, "message": "Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']", "param": null, "type": "invalid_request_error"}}

Does anyone have any idea what I'm doing wrong?

jawn
  • 851
  • 7
  • 10
  • I have the same problem. Did you manage to solve this? – Michael Ceber Apr 10 '23 at 21:23
  • @MichaelCeber I ended up passing this base64 to my node backend and using the Buffer class to write the file then read it via fs.createReadStream and pass that to openai's node library. – jawn Apr 11 '23 at 16:06
  • Ahh funny, as after I sent this message, thats exactly what I did, moved it off the UI and to the back end code which was reliable (my back end though is c#) - was always going to put it in the back end anyway as need to secure the api key etc there... – Michael Ceber Apr 12 '23 at 21:07
  • 1
    @MichaelCeber yeah man I tried everything, something's up with the Buffer implementation on the client side for this scenario. glad you got it to work. – jawn Apr 13 '23 at 14:38

1 Answers1

3

Try adding a filename to the formData.append. Something similar to this:

formData.append('file', file, 'input.mp4');

Whisper shouldn't rely on the extension, but it seems like it does.

Radu Diță
  • 13,476
  • 2
  • 30
  • 34
  • Yes, I confirm this is a good answer, had the same issue and solved it like that. – acortad May 19 '23 at 12:08
  • I tried adding the filename with the above implementation but my code fails when defining the blob. https://stackoverflow.com/questions/76367660/react-native-using-expo-av-ios-mp4-file-openais-audio-transcriptions-invalid-fi – Ibra May 30 '23 at 18:31