The feeling of this transformation for a spectrogram(where the x-axis is time and the y-axis is frequency) is somehow stretching it along the y-axis according to different values of alpha, while the top(maximum frequency) and the bottom(zero frequency) remain unmoved. But now I don't really have an idea of how to implement it.
First, on which step should I do this frequency warping? I'm using Librosa to extract features and convert audios to log-mel spectrograms. Should this be done before converting to melspectrogram or before/after STFT?
Second, in which way can I map each frequency according to the formula? The author mentioned they used OpenCV's Geometric Image Transformations, but I only found the Affine Transformation and Perspective Transformation that seem related, but I didn't manage to achieve this mapping by using them.
Any suggestion and comment are welcome, thank you so much!