Questions tagged [librosa]

librosa is a python package for music and audio analysis.

Following are some of the features of librosa

  • Load audio input
  • Compute mel-spectrogram, MFCC, delta features, chroma
  • Invert mel-spectrogram, MFCC or chroma back to waveform
  • Locate beat events
  • Compute beat-synchronous features
  • Display features
  • Save beat tracker output to a CSV file

For detailed information and examples, visit the librosa documentation.

See also the official Github page.

750 questions
1
vote
0 answers

Issues importing librosa on python ImportError: cannot import name 'iter_entry_points'

import librosa This is the line that caused the error. It seems like something that I am not sure how to fix this error -- it is from just importing librosa :( It seems like this is an error that is from the package itself and not from my code. Any…
Sam
  • 43
  • 6
1
vote
1 answer

Librosa writing an audio time series [y] as float64 even when specified to write as float32

Librosa is writing an audio file with y as ndtype=float64 even though I hand it as a float32. I am using version 0.7.2. Am I doing something wrong? Here is basically what I'm doing: y, sr = librosa.load("audio_file", mono=False, sr=None,…
John Lexus
  • 3,576
  • 3
  • 15
  • 33
1
vote
1 answer

Is it possible to convert a time series from mono to stereo?

I am loading an audio file using librosa and would like to potentially convert it to stereo from mono if the file is mono. I'm sure it does not need to be said, but the audio time series is an "np.ndarray [shape=(n,) or (2, n)]". Essentially, I am…
John Lexus
  • 3,576
  • 3
  • 15
  • 33
1
vote
0 answers

Why 128 mel bands are used in mel spectrograms?

I am using the mel spectrogram function which can be found here:Mel Spectrogram Librosa I use it as follows: signal = librosa.feature.melspectrogram(y=waveform, sr=sample_rate, n_fft=512, n_mels=128) Why is 128 mel bands use? I understand that the…
swe87
  • 129
  • 1
  • 3
  • 13
1
vote
0 answers

ValueError: cannot reshape array of size 15183 into shape (1,8000,1)

I am building a speech recognition model after training the model with .wav file (mono)(16000 sampling rate) I tried to test it using a recorded audio the recorded audio's parameter was like the parameter of the audio files which with the model was…
1
vote
1 answer

What is the best way to shift a pitch of an audio?

I have tried PyRubberband, librosa, praat-parselmouth and pysox. All off them work but I still hear some noise or small artifacts in the output. Also, they shift the audio around 100 ms. How can I tune them to get the best possible quality or can…
F. Vosnim
  • 476
  • 2
  • 8
  • 23
1
vote
1 answer

How to decode .ogg opus to int16 NumPy array with librosa?

What I'm trying to do I'm trying to transcribe Telegram audio messages, using Mozillas speech-to-text engine deepspeech. Using *.wav in 16bit 16khz works flawless. I want to add *.ogg opus support, since Telegram uses this format for it's audio…
blkpingu
  • 1,556
  • 1
  • 18
  • 41
1
vote
0 answers

Voice Activity Detection using webrtcvad

I am trying to write a code to get a binary output using webrtcvad module of a .wav format audio by dividing them into small chunks of 20ms. I am trying to get 1 as the output when audio is present and 0 when there is no audio in that small audio…
Aman Singh
  • 23
  • 7
1
vote
0 answers

Python Matplotlib - MFCC: X axsis is fixed to 2?

I try to plot the mfcc features of a sound in matplotlib. Unfortunately I am not able to plot only the first 0.54 s. It seems like that the plot fixed to 2 s. How can I change that? import matplotlib.pyplot as plt import librosa import…
user8495738
  • 147
  • 1
  • 3
  • 14
1
vote
1 answer

Voice Activity Detection

I am getting a problem while trying to get the binary result using webrctvad in a wave format audio file. I am using librosa in order to load the audio file in .wav format. Can anyone tell me how to use librosa along with webrtcvad in order to get…
1
vote
1 answer

No Backend error despite of downloading ffmpeg and setting path variable python

I am having a no backend error in my python code to open an audio file using librosa module. I have downloaded ffmpeg and seted environment but still I am getting no backend error. I am getting this error with .mp3 extension, with wav it is working…
1
vote
1 answer

Can someone help me understand the np.abs conversion for STFT in librosa?

>>> y, sr = librosa.load(librosa.util.example_audio_file()) >>> D = np.abs(librosa.stft(y)) >>> D array([[2.58028018e-03, 4.32422794e-02, 6.61255598e-01, ..., 6.82710262e-04, 2.51654536e-04, 7.23036574e-05], [2.49403086e-03,…
Akash Sonthalia
  • 362
  • 2
  • 12
1
vote
1 answer

How to decide filter order in Linear Prediction Coefficients ( LPC ) while calculating formant frequency features?

I am new in signal processing and trying to calculate formant frequency features for different .wav files. For calcuating formant frequency, I need three parameters values : Linear Prediction Coefficients ( LPC ) root angle I am trying to…
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88
1
vote
2 answers

Ignore all warnings from a module

i'm having some problems with librosa python module. It shows me the following warning at import. /opt/anaconda3/envs/pox/lib/python3.6/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that…
Giuppox
  • 1,393
  • 9
  • 35
1
vote
2 answers

how to read number of channels without ffmpeg or wave?

This code works: import wave f1 = wave.open(file1, "r") num_channels_file1 = int(f1.getnchannels()) but it does not work when reading a wav file with a diff bitrate or other property. I can't figure out the difference btw wav files or other…
ERJAN
  • 23,696
  • 23
  • 72
  • 146