Questions tagged [audio-processing]

Audio processing involves the study of mathematical and signal processing techniques to understand or alter the nature of audio signals. The different kind of audio signals under study include speech, music, environmental audio and computer audio. Audio is analyzed in the temporal or spectral domain by applying various filters.

Key concept is to transform the audio into PCM format so you have access to the raw audio curve. Each channel will have its own curve.

Digital audio is represented by a series of points on this curve. Each point is called an audio sample. Numerical value of each sample can be represented in either integer or floating point.

Be aware to map each audio sample numerical value to memory typically requires several bytes of storage. One byte can store only 2^8 distinct values (256) which will result in noticeable distortion. High quality audio is typically stored using at least two bytes of storage per audio sample. When we use two bytes this gives us 2^16 possible values of the raw audio curve height as the audio wobbles up and down. The more bytes we use for storage the higher fidelity we gain as this reduces the gap between each distinct curve height measurement. This called bit depth. CD quality audio uses two bytes per audio sample per channel. The other fundamental aspect of digital audio is Sample Rate with determines the number of samples per second of time.

556 questions
0
votes
1 answer

Beat tracking with songs from music streaming services

I am planning to write an app that can play the songs from the user's music library (such as Spotify & Apple Music) while carrying beat tracking. The end goal is to show a music visualizer like this: Is it possible to read the full song in advance?…
xyz.dev
  • 3
  • 2
0
votes
1 answer

GStreamer - Generate audio waveform from MP4 file

I have a two-part question, 1) I have an MP4 file and want to generate it's audio waveform. 2) I have another MP4 file which has audio at channel [0] and channel [1] and a video track too, I want to generate waveforms for both channels as separate…
Harry
  • 199
  • 1
  • 2
  • 11
0
votes
2 answers

Steganography on audio/video with Python

I want to do message embedding in audio/video files using Python. Does anyone have information about some libraries I can use for bit manipulation in audio/video ?
sril
  • 53
  • 1
  • 7
0
votes
2 answers

Java - Adjust playback speed of a WAV file

I'm likely dense but I cannot seem to find a solution to my issue (NOTE: I CAN find lots of people reporting this issue, seems like it happened as a result of newer Java (possible 1.5?). Perhaps SAMPLE_RATE is no longer supported? I am unable to…
Alan Pauley
  • 121
  • 1
  • 10
0
votes
0 answers

Apply filtfilt on successive blocks with initial conditions (to avoid discontinuity)

We have two lowpass filters with a different cutoff value: b, a = signal.butter(2, 0.125) b2, a2 = signal.butter(2, 0.140) When applying the first filter to x[0:10000] and the second to x[10000:20000] with lfilter, we have to use initial…
Basj
  • 41,386
  • 99
  • 383
  • 673
0
votes
1 answer

Best method to speed up/slow down spoken english (NOT music) recording

I am looking for an algorithm to speed up English speech. Algorithms used for speeding up music generate many artifacts over doubled speed, and I am looking for something that works even at speeds of 3x or 4x with acceptable clarity. Voice,…
TFuto
  • 1,361
  • 15
  • 33
0
votes
2 answers

How to extract the common part between two audio signals and remove it from the signal?

If I have two audio signals Y1 and Y2 in Fourier domain that are the results of multiplication of S with H1 and H2 respectively (convolution in time domain): Y1=H1*S Y2=H2*S And I don't have S and H1, H2, but I know that S is the same in both Y1 and…
0
votes
1 answer

Batch Processing using Sox

I know the command for trimming is sox input output trim . But how do I convert all the .wav files in a folder into 1 second audios?
Saad
  • 159
  • 1
  • 2
  • 14
0
votes
2 answers

What is the unit of audio sample in librosa?

These days, I'm using librosa which is a kind of audio processing library. As a basic step to load audio files, one can use the function below. librosa.core.load() Then an audio file is represented as audio time series. I think each value of the…
K. Min
  • 1
0
votes
0 answers

how to plot tensorflow mfcc in python using matplotlib

I followed this example to compute mfcc using tensorflow. To visualize I tried to use matplotlib as mentioned here. But it says Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn. When I…
0
votes
1 answer

iOS - Combine videos to one frame with its own frame

I have a serious problem in making photo / video frame in iOS. The target is to make one video from several videos and several images based on special frame. For example, on this frame, the result video will play video 1,2,3 on the background image…
Mark
  • 271
  • 1
  • 9
0
votes
0 answers

How to apply gaussian filter to raw audio files Python?

I am recording raw files in python to later break them into phonemes, but the noise in the surrounding environment is hampering the result. So, there is a need to apply a filter to the recorded raw audio files. How can this be done ??
shreyansh
  • 108
  • 2
  • 11
0
votes
1 answer

Yin algorithm(Pitch detection) - Alternative to Difference Function

I have implemented Yin Algorithm to detect pitch. My issue is with the Performance of Difference Function(Equation 6) Difference Function: static std::vector difference(const std::vector &data) { int index, tau; double…
0
votes
1 answer

What is correct way to add audio samples into the one without clipping

I generate sound samples at different frequency (sin/saw/triangle generators) as an array of double values [-1...1] ​​(1-maximum amplitude). I'd like to combine all signals to one. 1) If I add (combineWithNormalize) and finally normalize to…
MaratSR
  • 77
  • 1
  • 6
0
votes
1 answer

How to detect word boundaries/estimate words count with audio processing? (w/o speech recognition)

is it possible to detect word boundaries with basic audio processing offline to get accurate enough WPM* estimate? I think it can be done by detecting pauses (indicates a word boundary). will it be cross-Lingual and work on all languages? in…