0

I need to figure out the difference in loudness between two files (A and B) expressed in decibels, so that I can pass this value to an application as an argument so file A's audible volume can be played back at a similar level as file B's.

Borrowing code from this answer I have a function that extracts data from both audio and video files:

import numpy as np
from moviepy.editor import AudioFileClip

def get_volume(fname):
    clip = AudioFileClip(fname)
    cut = lambda i: clip.subclip(i,i+1).to_soundarray()
    volume = lambda array: np.sqrt(((1.0*array)**2).mean())
    return np.array([volume(cut(i)) for i in range(0,int(clip.duration-2))]).max()     

With this code I can extract values out of both audio and video files:

# in this example, the video file is louder than the audio file
A = get_volume(<path_to_some_video_file>).max() # 0.12990663749524628
B = get_volume(<path_to_some_audio_file>).max() # 0.10334934486576164

delta = A-B # 0.02655729262948464

In this example the video file A's volume is louder than the audio file B's. I need to convert the delta in decibels so that in a cli I can pass that value as an argument to either boost or reduce the audio output so the file A can be played back at a volume matches file B's.

# CLI example command where lets say the delta (0.026...) is converted to -12db 
# so the video file's volume will match the audio's loudness with
<my application> -volume -12 <path_to_file_video_file>  

QUESTION

What is the proper way to take the difference between both file's audio output and calculate a difference expressed in decibels.

Fnord
  • 5,365
  • 4
  • 31
  • 48

3 Answers3

0

There are a few things to consider when dealing with the "loudness" of an audio file: You can either examine the peak value of the volume or an average value, in signal processing often the RMS (root mean square). Your code example seems to just extract the maximum (= peak) value from the array. This might not be representative for files which have a low volume in general but few very loud passages.

For dealing with these issues, the concept of human-perceived loudness has been introduced. The wikipedia articles on Audio normalization and EBU R 128 are a good starting point for reading.

Having said that, I would recommend to use an external library or tool for normalizing audio. There are several filters for exactly this purpose available in ffmpeg, and also in the tool ffmpeg-normalize.

Gerd
  • 2,568
  • 1
  • 7
  • 20
0

The mathematical solution I was looking for can be found in the madmom package. The proper terms I had to search for were attenuation and gain, where attenuation is lowering a signal and gain is boosting the signal, both by a value expressed in decibels.

For my purposes I needed a value expressed as an integer ranging -60db to 60db (121 possible values), so I wrote the following function to help me find the nearest appropriate value:

import numpy as np
from madmom.audio.signal import attenuate

def get_attenuation(signal, reference):
    # apply an attenuation range to the given signal
    # ranging from a -60db attenuation to a +60db boost 
    samples = np.array([attenuate(signal, x).max() for x in np.linspace(-6,6,121)])

    # return db value corresponding to the nearest matching sample
    index = np.argmin(np.abs(samples-reference))
    return int(np.linspace(-60,60,121)[index])

And so using the values A and B in my example I now obtain:

get_attenuation(A,B) # 20db difference

I can now pass that argument to my app and all works!

For more precision, use more samples and don't round as an integer. Maybe look at madmom.audio.signal.attenuate's source for a more direct and precise mathematical solution.

Fnord
  • 5,365
  • 4
  • 31
  • 48
0

The ops answer uses a brute force method by attenuating the signal by different values and comparing each trial to the original in order to find the closest match. This approach has several drawbacks, most notably performance but also lacks the ability to find the exact answer in most cases.

The direct approach is to compute the RMS value of the entire signal and then convert that result to dB.

rms = np.sqrt(np.mean(np.square(signal)))
dB  = 20*np.log10(rms)
jaket
  • 9,140
  • 2
  • 25
  • 44