1

I have a feature extraction REST API written in Python using the Librosa library (Extracting audio features), it receives an audio file through HTTP POST and responds with a list of features(such as MFCC,etc).

Since librosa depends on SoundFile (libsndfile1 / libsndfile-dev), it doesn't support all the formats, I'm converting the audio file using ffmpeg-python wrapper (https://kkroening.github.io/ffmpeg-python/) .

It works just fine on my Windows 10 machine with Conda, but when I deploy it on Heroku, the librosa.load() functions returns an unknown format error, no matter what format I convert it to. I have tried FLAC, AIFF and WAV.

My first guess is that the converted format isn't supported by libsndfile1, but it works on my local server (plus, their documentation says AIFF and WAV are supported), so I'm a little lost.

I have attached all the relevant snippets of code below, I can provide anything extra if necessary. Any help is highly appreciated. Thanks.

UPDATE1:

I am using pipes instead of writing and reading from disk, worth a mention as the question could be misleading otherwise.

The log:

File "/app/app.py", line 31, in upload
x , sr = librosa.load(audioFile,mono=True,duration=5)
File "/app/.heroku/python/lib/python3.6/site-packages/librosa/core/audio.py", line 164, in load
six.reraise(*sys.exc_info())
File "/app/.heroku/python/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/app/.heroku/python/lib/python3.6/site-packages/librosa/core/audio.py", line 129, in load
with sf.SoundFile(path) as sf_desc:
File "/app/.heroku/python/lib/python3.6/site-packages/soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "/app/.heroku/python/lib/python3.6/site-packages/soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "/app/.heroku/python/lib/python3.6/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening <_io.BytesIO object at 0x7f46ad28beb8>: File contains data in an unknown format.
10.69.244.94 - - [15/Mar/2020:12:37:28 +0000] "POST /receiveWav HTTP/1.1" 500 290 "-" "curl/7.55.1"

Flask/Librosa code deployed on Heroku (app.py):

from flask import Flask, jsonify, request
import scipy.optimize
import os,pickle
import numpy as np
from sklearn.preprocessing import StandardScaler
import librosa
import logging
import soundfile as sf
from pydub import AudioSegment
import subprocess as sp
import ffmpeg
from io import BytesIO

logging.basicConfig(level=logging.DEBUG)

app = Flask(__name__) 

@app.route('/receiveWav',methods = ['POST'])
def upload():
    if(request.method == 'POST'):
        f = request.files['file']
        app.logger.info(f'AUDIO FORMAT\n\n\n\n\n\n\n\n\n\n: {f}')
        proc = (
            ffmpeg.input('pipe:')
            .output('pipe:', format='aiff')
            .run_async(pipe_stdin=True,pipe_stdout=True, pipe_stderr=True)
        )
        audioFile,err = proc.communicate(input=f.read())
        audioFile =  BytesIO(audioFile)
        scaler = pickle.load(open("scaler.ok","rb"))
        x , sr = librosa.load(audioFile,mono=True,duration=5)
        y=x
        #Extract the features
        chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
        spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
        spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
        rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
        zcr = librosa.feature.zero_crossing_rate(y)
        rmse = librosa.feature.rms(y=y)
        mfcc = librosa.feature.mfcc(y=y, sr=sr)
        features = f'{np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'    
        for e in mfcc:
            features += f' {np.mean(e)}'
        input_data2 = np.array([float(i) for i in features.split(" ")]).reshape(1,-1)
        input_data2 = scaler.transform(input_data2)
        return jsonify(input_data2.tolist())
  
# driver function 
if __name__ == '__main__':   
    app.run(debug = True) 

Aptfile:

libsndfile1
libsndfile-dev
libav-tools
libavcodec-extra-53 
libavcodec-extra-53
ffmpeg 

requirements.txt:

aniso8601==8.0.0
audioread==2.1.8
certifi==2019.11.28
cffi==1.14.0
Click==7.0
decorator==4.4.2
ffmpeg-python==0.2.0
Flask==1.1.1
Flask-RESTful==0.3.8
future==0.18.2
gunicorn==20.0.4
itsdangerous==1.1.0
Jinja2==2.11.1
joblib==0.14.1
librosa==0.7.2
llvmlite==0.31.0
MarkupSafe==1.1.1
marshmallow==3.2.2
numba==0.48.0
numpy==1.18.1
pycparser==2.20
pydub==0.23.1
pytz==2019.3
resampy==0.2.2
scikit-learn==0.22.2.post1
scipy==1.4.1
six==1.14.0
SoundFile==0.10.3.post1
Werkzeug==1.0.0
wincertstore==0.2
Community
  • 1
  • 1
Rohan Bojja
  • 655
  • 1
  • 16
  • 35
  • Librosa usually does accept wav and mp3 in my experience. I think you should do two things: First, figure out if you can play the problematics wavs at all, and if they are indeed mono. Is there a specific reason why you input the other arguments other than the file name? Librosa is usually quite good at infering that. Second, if there is still a problem, try to open the audio file using sounddevice, or some other library. It should be a good control. – boomkin Mar 15 '20 at 13:02
  • And, it needs a filename as input. I think you are feeding in a BytesIO object, convert it to a nice string filename, otherwise it should not work, I think. – boomkin Mar 15 '20 at 13:03
  • @boomkin Thanks for the reply. The wavs are indeed playable on my PC. Librosa on my PC loads them and returns the features too. I'll try loading them using an other library and get back to you. – Rohan Bojja Mar 15 '20 at 13:05
  • Librosa can load a BytesIO object as well, since I can't write to the disk on Heroku with consistency (Ephemeral disk) I'm using pipes to feed the data. Will do some testing and get back with more details. – Rohan Bojja Mar 15 '20 at 13:07
  • @RohanBojja Did you manage to make it work? I'm stuck with the same problem. – Niyas Dec 09 '21 at 16:23

0 Answers0