0

I'm using Librosa and I have a memory problem.

I have a lot of audio files, let's say a hundred. I process audio files one by one. Each audio file is loaded by chunks of 1 minute. Each chunk is processed before I move to the next chunk. This way, I know that I never have more than 60s of audio in memory at a given time. This allows me to avoid using too much memory during the whole process.

For some reason, the memory used by the process is growing over time.

Here is a simpler version of the code:

import librosa
import matplotlib.pyplot as plt
import os
import psutil

SAMPLING_RATE = 22050
N_FFT = 2048
HOP_LENGTH = 1024

def foo():
    y, sr = librosa.load("clip_04.wav", sr=SAMPLING_RATE, offset=600, duration=60)
    D = librosa.stft(y, n_fft=N_FFT, hop_length=HOP_LENGTH)
    spec_mag = abs(D)
    spec_db = librosa.amplitude_to_db(spec_mag)
    return 42 # I return a constant to make sure the memory is (or should be) released

def main():

    process = psutil.Process(os.getpid())

    array = []

    for i in range(100):
        foo()
        m = int(process.memory_info().rss / 1024**2)
        array.append(m)

    plt.figure()
    plt.plot(array)
    plt.xlabel('iterations')
    plt.ylabel('MB')
    plt.show()

if __name__ == '__main__':
    main()

Using this code, the memory increases like this:

Memory variation

Is that normal? And if it is, is there a way to clear Librosa memory at each iteration?

user202729
  • 3,358
  • 3
  • 25
  • 36
Vincent Garcia
  • 635
  • 10
  • 19
  • As a workaround it's always possible to spawn new subprocess (`multiprocessing` module) and stop them when they're done. – user202729 Feb 21 '21 at 15:33
  • Although is there any way to reproduce the issue without having the `clip_04.wav` file? – user202729 Feb 21 '21 at 15:34
  • Thanks for the multiprocessing suggestion. I can share the `clip_04.wav` file if needed of course, but I think this can be verified with any given audio file. – Vincent Garcia Feb 21 '21 at 15:40
  • Perhaps you can make the code self-contained by including an autogenerated audio file? https://stackoverflow.com/questions/24236678/can-i-use-sox-to-generate-audio – user202729 Feb 21 '21 at 15:42
  • Good idea. I'll give it a try! – Vincent Garcia Feb 21 '21 at 17:04
  • If you are using librosa v0.8, you could also use the built-in examples: https://librosa.org/doc/main/recordings.html – Hendrik Feb 21 '21 at 17:39
  • Is the [cache](https://librosa.org/doc/main/cache.html) perhaps turned on? If so, you might want to purge it using `librosa.cache.clear()` – Hendrik Feb 21 '21 at 17:42
  • I set the cache level to 10 and I tried to use `librosa.cache.clear()` whitin the for loop -> no change – Vincent Garcia Feb 21 '21 at 17:57
  • I have tried your code with an older *librosa* version (0.6.2) and `librosa.util.example_audio_file()` as audio file. When run for 500 iterations, memory consumption stabilizes after about 100 iterations around roughly 240 MB. What happens in your case, when you run more than 100 iterations? – Hendrik Feb 21 '21 at 18:21
  • I get 477.027 MB. But do you see an increase in memory usage @Hendrik? – Vincent Garcia Feb 21 '21 at 18:48
  • Not after about 100 iterations. It then stays flat, goes up and down a bit. – Hendrik Feb 21 '21 at 19:36
  • How much memory do you have? Unless there is memory pressure, it may not get freed – Jon Nordby Feb 23 '21 at 13:49
  • I have about 4GB of memory. But at some point, the memory is full and the process crashes. – Vincent Garcia Feb 24 '21 at 08:33
  • I have noticed this behaviour also. [This](https://groups.google.com/g/librosa/c/dpkuXbHEqCQ) might be the beginning of an explanation, but that's not a solution. One option is to use PySoundFile for loading, but there is still the resampling problem. For now, I reduce the amount of files to be handled in a single python process, and run the script several times. Not very elegant, though. – guik Mar 01 '21 at 18:58
  • After reporting this issue to Librosa, I was asked to try to use Soundfile directly to read the wav file. The behaviour still remains. I'm not sure where the problem is really, maybe from macOS? But I really don't know. Here is a link to the ticket: https://github.com/librosa/librosa/issues/1286 – Vincent Garcia Mar 02 '21 at 09:03

0 Answers0