I'm using IBM's Text-to-Speech API to run speaker detection. I used pydub to concatenate several .wav files into one, but I cannot pass an AudioSegment to IBM.
My questions are:
Can I export my file directly to an AWS S3 bucket, as I can later retrieve from there?
How else could I pass the AudioSegment? Can I encode it differently as a variable, so exporting it without saving it in memory, if that makes sense?
This is the formats IBM can read
- application/octet-stream
- audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
- audio/basic (Required. Use only with narrowband models.)
- audio/flac
- audio/g729 (Use only with narrowband models.)
- audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
- audio/mp3
- audio/mpeg
- audio/mulaw
- audio/ogg
- audio/ogg;codecs=opus
- audio/ogg;codecs=vorbis
- audio/wav
- audio/webm
- audio/webm;codecs=opus
- audio/webm;codecs=vorbis
I love pydub and it's been an amazing tool to work with so far. Thank you for making it!