1

Here is what I am trying to do: I would like to take a WAV file (for example, let's say https://freesound.org/people/thefsoundman/sounds/118513/, which is 109KB). I want to analyze this WAV file using SPEAR, which I've also done, and produced this output: SPEAR analysis of WAV file

Now I want to use this data to play an approximation of the sound, using Web Audio API, by creating an oscillator using a periodic wave. Something like this:

let real = new Float32Array([ /* lots of numbers */ ]);
let imag = new Float32Array([ /* lots of numbers */ ]);
let wave = ctx.createPeriodicWave(real, imag);

let o = ctx.createOscillator();
o.setPeriodicWave(wave);
o.frequency.value = /* ? */;

I'm at a loss as to how to convert the output I'm seeing in SPEAR (or any other equivalent tool you might suggest) into the "fourier coefficients" (cosine and sine values) expected by the createPeriodicWave function. I'm also unsure whether the question I'm asking even makes sense, and whether it's remotely possible to represent a generic WAV file as a periodic wave table like this.

(In case you ask: the goal here is to determine the smallest possible size it would take to play a sound "close to" the original sound. I realize I could shrink 109KB down considerably by going to a mono, 11KHz WAV, compressing to MP3, etc., but I would prefer to represent as a series of numbers if possible.)

Any sound experts out there that can give me a next step?

Elliot Nelson
  • 11,371
  • 3
  • 30
  • 44
  • Is there reason you can't use the `ctx.createBuffer` api? – jaket Sep 17 '18 at 05:49
  • @jaket I'm open to using `createBuffer`, but I don't know if that gets me any closer; my impression is I'd basically be creating a buffer of roughly 20,000 floats, and then coming up with the data myself by adding up multiple Math.sin / Math.cos values. That does sound like it gives me maximum control, and could be a useful technique, but I still don't know what formula to use for the audio data. – Elliot Nelson Sep 17 '18 at 10:21
  • SPEAR looks like it's approximating the signal's short-time Fourier transform (STFT) as a sequence of linear functions, so it should be totally possible to convert its coefficients into audio. Does SPEAR output its result in some easily-consumable format? What do the black versus gray outputs in the image you posted mean? Is using SPEAR's decomposition an absolute requirement, or are you open to other numeric-approximations? – Ahmed Fasih Sep 22 '18 at 19:32
  • @AhmedFasih I'm open to virtually anything, really; I'm trying to find the best way for a layman to take a generic WAV file and attempt to recreate the sound using basic oscillators (understanding that the more complex the original sound, the more unlikely basic oscillators will sound anything like it). – Elliot Nelson Sep 23 '18 at 19:49
  • Ok! So: any signal (including sound) can be represented as a sum of oscillators (sinusoids)—that's the crucial insight of Fourier series. An `N`-length signal can be perfectly generated using `N/2` sinusoids, where each sinusoid includes an angle and phase (for a total of `N` numbers ). As you might imagine, you need few sinusoids if your signal is the sum of a few sinusoids. I like the SPEAR image you posted. Can you check and see if SPEAR can export the results of that analysis in some digital format, i.e., a list of time/frequency/amplitudes? That can be converted to sound. – Ahmed Fasih Sep 23 '18 at 21:51
  • Basically SPEAR is producing a simplified short-time Fourier transform: see [my Python answer](https://stackoverflow.com/a/51773435/500207). It'll be easy to convert it to audio using FFTs or even actual sums-of-sinusoids. The thing that's not standard is how to "simplify" the signal, which SPEAR here does for you; once we show you how to convert such a "simplified representation" to audio, then you can experiment with different ways to approximate the audio in the first place (i.e., replacements for SPEAR). – Ahmed Fasih Sep 23 '18 at 21:54

0 Answers0