4

I have a 2D numpy array of a audio spectrogram and I want to save it as an image.

I'm using librosa library to get the spectrum. And I can also plot it using librosa.display.specshow() function. There are number of different scaling types as you can see below.

import PIL
import librosa
import librosa.display

def display_spectrogram(spectrum, sampling_rate):
    """
    Frequency types:
    ‘linear’, ‘fft’, ‘hz’ : frequency range is determined by the FFT window and sampling rate.
    ‘log’ : the spectrum is displayed on a log scale.
    ‘mel’ : frequencies are determined by the mel scale.
    ‘cqt_hz’ : frequencies are determined by the CQT scale.
    ‘cqt_note’ : pitches are determined by the CQT scale.
    """

    librosa.display.specshow(spectrum, sr=sampling_rate, x_axis='time', y_axis='log')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Spectrogram')
    plt.show()

I can also transform the spectrogram (a numpy array) to an image and save like below.

img = PIL.Image.fromarray(spectrum)
img.save("out.png")

I have the original spectrogram (linear scaled) and I want to save it with y-axis in log scale. I looked into the library's source code in order to understand how it scaled but cannot figure it out.

How can I log scale the y-axis of an image / 2D numpy array ?


linear matrix log scaled result

enesdemirag
  • 325
  • 1
  • 10
  • @Antimon I don't want to change the values, I just want to squeeze them logarithmically. – enesdemirag Nov 22 '20 at 17:58
  • 1
    Nevermind, I misinterpreted your problem. I understand now. But you will have to tell us what the format of the numerical data is. How are the frequencies and time points indexed? – Antimon Nov 22 '20 at 18:03

1 Answers1

3

The actual log-transform of the Y axis is done by matplotlib. You can test this by doing ax.set_yscale('linear') vs ax.set_yscale('linear'). So the easiest alternative would be to tweak the matplotlib figure to remove ticks, borders etc. Here is one example of that: https://stackoverflow.com/a/37810568/1967571

If you want to do the log-scaling yourself, the steps are

  • Compute the current frequencies on Y axis. Using librosa.fft_frequencies
  • Compute the desired frequencies on Y axis. Using numpy.logspace or similar
  • Sample the spectrogram at the desired frequencies, using for example scipy.interpolate (interp1d)
Jon Nordby
  • 5,494
  • 1
  • 21
  • 50
  • 1
    Thanks @jonnor. using ```plt.axis("off")``` and ```plt.tight_layout(pad=0)``` I saved the images. It solves the problem with a different approach – enesdemirag Dec 06 '20 at 14:35