4

I am trying to measure the "loudness" of various clips (ranging from ~2-40 seconds) of TV content. I'm interested in the relative loudness of the content - what scenes have people shouting vs whispering, loud music vs. quiet scenes, etc.

I think this means I'm interested in capturing the gain (INPUT loudness) not the volume (OUTPUT loudness)...

I have tried two methods with Python:

  1. librosa's RMS: np.mean(librosa.feature.rms(spectrogram, center=True).T, axis=0)

  2. pyloudnorm: (which implements the ITU-R BS.1770-4 loudness algorithm (LUFS))

    meter = pyln.Meter(samplerate)
    loudness = meter.integrated_loudness(waveform)
    

When I compare the results of the two, they are sometimes aligned, but often different (the same articles show a relatively high RMS, but low loudness, and vice versa). More importantly, while they both appear to get some things right, neither seems to be a very accurate representation of what is coming out of the TV. I'm wondering if there is some step I need to take to filter out some frequencies that are not perceived but affect these metrics in some way, or if I'm just missing something major?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ginobimura
  • 115
  • 1
  • 5

1 Answers1

2

Loudness, how loud something is perceived to be, can be quite tricky. It is known to be frequency dependent, and we are more sensitive to a middle range of frequencies. It is non-linear with respect to the amplitude. At some point twice the.

There are also time-dependent effects at short scales, and sudden loud sounds cause the sounds that follow to appear less loud than if the prior sounds where not there (temporal masking). And at long scales - we tend to adapt to gradually increasing volumes (desensitization). We tend to filter out sounds with little information (like static/repetitive noise). Etc..

You should at least apply frequency weighting. A-weighting is commonly used. This can be done by weighting the STFT spectrogram from librosa. And then you can compute the RMS of that. You should also convert it to decibel.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jon Nordby
  • 5,494
  • 1
  • 21
  • 50
  • 1
    "You should at least apply frequency weighting." You should acknowledge that LUFS, mentioned by OP, is already a frequency-weighted loudness metering algorithm. – Edward Jul 02 '23 at 18:19