1

I am converting a Python program to Node.js, the program follows these steps:

  1. Microphone listens with callbacks
  2. Callbacks do a Librosa "log_mel_S" extraction
  3. The "log_mel_S" is inferenced by an AI model
  4. Sound is labeled

I have managed to translate all of the steps and their relatives from Python to Node.js, except for the Librosa extraction. This would be an example for the audio shape and type required:

audio_sample = numpy.zeros(shape=(1024, 100), dtype=numpy.float32)

And this is the Librosa piece I need help translating:

S = numpy.abs(librosa.stft(y=audio_sample, n_fft=1024, hop_length=500)) ** 2
mel_S = numpy.dot(librosa.filters.mel(sr=44100, n_fft=1024, n_mels=64), S).T
log_mel_S = librosa.power_to_db(mel_S, ref=1.0, amin=1e-10, top_db=None)

I found this package Meyda, and it looks like it can be a good substitute, but I am not sure how I should approach this, it is unclear to me what is being extracted from Librosa, so I cannot identify the terms like Amplitude Spectrum, Power Spectrum, etc. Please help me understand and translate this action.

belferink1996
  • 53
  • 1
  • 9

1 Answers1

0

TL;DR Amplitude Spectrum is basically FFT of the signal, and Power Spectrum is a squared value of the Amplitude Spectrum, which is also referred as energy sometimes. Here is one of examples from Meyda website that is calculating Amplitude Spectrum https://github.com/catalli/audiotrainer-server/blob/df41322906c88cd6f899e8f9b9661ebb949f72e1/index.js#L17

Long answer:

Now, lets look into your code sample line by line and figure out what is it doing and how to implement it in javascript.

  1. S = numpy.abs(librosa.stft(y=audio_sample, n_fft=1024, hop_length=500)) ** 2

this is calculating square values of 1024 bins fft of audio_sample y, which is basically a Power Spectrum or an Amplitude Spectrum squared. Please note that the abs of complex number is a vector lenth: sqrt(real_part^2 + img_part^2)

  1. mel_S = numpy.dot(librosa.filters.mel(sr=44100, n_fft=1024, n_mels=64), S).T

this is an mfcc calculation, which is basically a product of predefined filter banks and fft squared.

  1. log_mel_S = librosa.power_to_db(mel_S, ref=1.0, amin=1e-10, top_db=None)

this last one will convert the result to decibel (dB) units (10 * log10(S / ref))

i will extend this answer with js code-sample later, submitting it now because i think it will be helpful already as it is

Anatoly T
  • 103
  • 7