1

enter image description here

The feeling of this transformation for a spectrogram(where the x-axis is time and the y-axis is frequency) is somehow stretching it along the y-axis according to different values of alpha, while the top(maximum frequency) and the bottom(zero frequency) remain unmoved. But now I don't really have an idea of how to implement it.

First, on which step should I do this frequency warping? I'm using Librosa to extract features and convert audios to log-mel spectrograms. Should this be done before converting to melspectrogram or before/after STFT?

Second, in which way can I map each frequency according to the formula? The author mentioned they used OpenCV's Geometric Image Transformations, but I only found the Affine Transformation and Perspective Transformation that seem related, but I didn't manage to achieve this mapping by using them.

Any suggestion and comment are welcome, thank you so much!

B.W. Zhang
  • 49
  • 3
  • Can you please link to the paper/resource that introduces this formula / technique? – Jon Nordby Nov 10 '20 at 14:08
  • https://assets.amazon.science/8f/33/04709ab4460da4af7f80528ab61c/self-supervised-classification-for-detecting-anomalous-sounds.pdf Please refer. – B.W. Zhang Nov 11 '20 at 06:38

0 Answers0