Questions tagged [audio-processing]

Audio processing involves the study of mathematical and signal processing techniques to understand or alter the nature of audio signals. The different kind of audio signals under study include speech, music, environmental audio and computer audio. Audio is analyzed in the temporal or spectral domain by applying various filters.

Key concept is to transform the audio into PCM format so you have access to the raw audio curve. Each channel will have its own curve.

Digital audio is represented by a series of points on this curve. Each point is called an audio sample. Numerical value of each sample can be represented in either integer or floating point.

Be aware to map each audio sample numerical value to memory typically requires several bytes of storage. One byte can store only 2^8 distinct values (256) which will result in noticeable distortion. High quality audio is typically stored using at least two bytes of storage per audio sample. When we use two bytes this gives us 2^16 possible values of the raw audio curve height as the audio wobbles up and down. The more bytes we use for storage the higher fidelity we gain as this reduces the gap between each distinct curve height measurement. This called bit depth. CD quality audio uses two bytes per audio sample per channel. The other fundamental aspect of digital audio is Sample Rate with determines the number of samples per second of time.

556 questions
2
votes
3 answers

How to input audio data into deep learning algorithm?

I'm very new in deep learning, and I'm targeting to use GAN (Generative Adversarial Network) to recognize emotional speech. I've only known images being as inputs to most deep learning algorithms, such as GAN. but I'm curious as to how audio data…
2
votes
1 answer

How to implement a FIR high pass filter in Python?

First of all I asked this question in Stack Exchange and I am getting only concept related answers and not implementation oriented. So, my problem is I am trying to create high pass filter and I implemented using Python. from numpy import cos, sin,…
idkman
  • 169
  • 1
  • 15
2
votes
0 answers

What method does Librosa use to calculate Delta-MFCC?

I am trying to generate the delta-MFCCs. Apparently there are several implementations. I found the "regression" formula link here. But I don't understand why Librosa uses Savitsky-Golay filter, which is a smoothing filter. I have not found any…
Satashree Roy
  • 365
  • 2
  • 9
2
votes
2 answers

How to compare spoken audio against reference recording - language learning

I am looking for a way to compare a user submitted audio recording against a reference recording for comparison in order to give someone a grade or percentage for language learning. I realize that this is a very un-scientific way of doing things and…
Bruce Aldridge
  • 2,907
  • 3
  • 23
  • 30
2
votes
1 answer

How can I store the 50ms before and after an audio event in a circular buffer?

I am processing a dataset of 17 hours of audio .wav (16-bit PCM, 192khz), to simulate a "real-time" processing that will be embedded in an ESP32, Arduino DUE or in a RASP, depending on the results. How am I handling with that now? First I cut the 17…
2
votes
2 answers

Getting the amplitude(or rms voltage) of audio signal captured in C++ by wavin lib.?

I am working on a very basic robotics project, and wish to implement voice recognition in it. i know its a complex thing but i wish to do it for only 3 or 4 commands(or words). i know that using wavin i can record audio. but i wish to do real-time…
TarunG
  • 602
  • 5
  • 21
2
votes
1 answer

Sound is distorted after multiplying frequency spectrum by constant

I make a simple sound equalizer that operates in frequency domain and lets user to adjust frequencies in sound by using 4 sliders. The first one responsible for 0 - 5kHz, the fourth one for 15-20kHz. Steps are as follows: I read wav file and store…
mrJoe
  • 500
  • 3
  • 13
2
votes
2 answers

My note detection algorithm is failing on few cases?

I am using a simple approach to find out the musical note using FFT in python steps involved are: Reading the sound file(.wave) Detecting silence in the file(by computing square sum of squared elements of input falling within the window) Detecting…
2
votes
0 answers

How to use MFCC TarsosDSP with microphone in android

in this example (answer): How to get MFCC with TarsosDSP? they show how to use MFCC in android @Test from float array, Im trying to use it with data from microphone : int sampleRate = 44100; int bufferSize = 8192; int bufferOverlap =…
2
votes
1 answer

Process.getExclusiveCores() throws exception on certain devices

My Android app is currently in open beta and I am receiving crash reports from my beloved testers. Audio processing is the app's primary focus therefore the render thread is cpu intensive and time sensitive. In an attempt to achieve the best…
2
votes
1 answer

How do I properly setup multi band pass filters in Audiokit?

I would like to setup many band pass filters in AudioKit, to separate a sound source into many bands, each for further processing down the bus/chain. AudioKit has nodes in a sequence or bus. Each node has input, does something, and output. For the…
Brian H
  • 314
  • 3
  • 13
2
votes
1 answer

FFMPEG results in a silent video when trying to combine video and audio tracks

I'm using the following command to combine a video and an audio track. ffmpeg -y -i /var/www/temp/merged.mp4 -i /var/www/temp/combined.mp3 -strict -2 /var/www/temp/videoExtouiulbjryzxlehjj2.mp4 Edit: Here is the output from the first command ffmpeg…
2
votes
1 answer

FFMPEG Recode all audio streams while keeping originals

I am trying to add a additional set of audio tracks into some video files, as part of a automated process. I would like to keep all the original audio tracks and have a second re-coded copy. What I have been using is: ffmpeg -i file -map 0:v…
Olirav
  • 165
  • 2
  • 12
2
votes
1 answer

How to convert an AudioBuffer to a mp3 file?

Is there an easy way of doing that, or do I need to interleave the channels and create a DataView that contains a specific header format as well as the interleaved data?
Maxime Dupré
  • 5,319
  • 7
  • 38
  • 72
2
votes
0 answers

Cross Correlation with signals of different lengths in Java

I am trying to implement counting cross correlation in java based on its properties (the fifth one with FFT transform). It should work properly with signals of different length. I am using simple FFT library from Princeton University (FFT.java and…