Read audio file from s3 directly in python

Question

i want to read audio file from s3 directly in python.

First, I record audio, here is my blob settings

blob = new Blob(audioChunks,{type: 'audio/wav'});

then using django i uploaded this file to s3

req=request.POST.get('data')
d=req.split(",")[1]
file_content_io = BytesIO(base64.b64decode(d))
s3_path='audio/file_name_{}.wav'.format(random.randint(0,99))
default_storage.save(s3_path, file_content_io)

then i download file directly

from scipy.io.wavfile import read
from io import BytesIO
from urllib.request import urlopen

with urlopen(file) as response:
    audio=BytesIO(response.read())
    speech_array=read(audio)

Now its giving me following error

ValueError: File format b'OggS' not understood. Only 'RIFF' and 'RIFX' supported.

Any solution? I also tried librosa, thats also not working. The only thing i want to read file directly without saving in disk

score 1 · Answer 1 · answered Apr 09 '23 at 15:29

Don't know if you are willing to switch from urllib to boto3, but I resolved the issue of loading .wav files from S3 directly into Python using boto3:

import io
import boto3
import librosa

s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket_name')
for file in  bucket.objects.filter(Prefix='your_prefix'):
    bin_obj = file.get()['Body'].read()
    data = librosa.load(io.BufferedReader(io.BytesIO(bin_obj)))

Read audio file from s3 directly in python

1 Answers1