1

I am overlaying a bunch of audio segments, and want to be able to pass a tuple of values in the form of (1, 1, 1, 0.5, 0...) to my function, each number being a ratio that the volume of a segment should be scaled to. 0 should be absolutely silent, while 1 should be the original volume unmodified, and 0.5 exactly half. This is, as far as I understand, the behavior of the GainNode "gain" property.

I tried these so far:

def adjust_volume(audio_segment, ratio):
    decibel = pydub.utils.ratio_to_db(audio_segment.rms)
    return audio_segment - decibel * (1 - ratio)

and

SILENCE_THRESHOLD = -120.00
def adjust_volume(audio_segment, ratio):
    difference = SILENCE_THRESHOLD - audio_segment.dBFS
    return audio_segment + (difference - (difference * ratio))

Unfortunately both work imperfectly, meaning they don't exactly replicate the browser's (Mozilla Firefox) behavior. Using the first one it's possible to hear sounds with my audio player (foobar2000) even if I pass in a tuple only containing 0s, and while the second one manages to silence entire segments with the correct silence threshold, using 0.3 for example creates an audio level that is way lower than the one I can observe in my browser using the same value.

It should be noted that my technical audio knowledge is very limited. Are these simply technical inaccuracies created by different audio equipment, audio implementation details etc.? If that's the case, could someone suggest me the most "correct" way to do this scaling?

cryzed
  • 151
  • 1
  • 4
  • 7

1 Answers1

1

My first question is what 0.5 means, exactly. The loudness of a sound is logaraithmic (each time you double the amplitude, height, of the signal it sounds an equal amount louder)

That said, does 0.5 simple reduce the amplitude by half? If so, that would be about 6dB quieter (I think! I always confuse amplitude and power calculations haha). OR is 0.5 half way between silent and maximum loudness?

Anyway, if you want silence in pydub, reducing the volume by 120dB should do it. The maximum dynamic range humans can hear is 140dB, but CD audio (16 bit) is about 90dB.

pydub provides helper functions for fading between two volumes as well as just applying gain:

from pydub import AudioSegment
from pydub.utils import ratio_to_db, db_to_float

sound = AudioSegment.from_file('/your/file.wav')

# this is roughly -6.0
half_amplitude_in_db = ratio_to_db(0.5)

# these are all roughly the same result
half_amplitude1 = sound.apply_gain(half_amplitude_in_db)
half_amplitude2 = sound.apply_gain(-6.0)
half_amplitude3 = sound - 6.0

# Assuming 16-bit sound, that’s ~90dB dynamic range.
# so -45dB is half way to silent.
# Note: that is A LOT quieter
half_way_to_silent = sound - 45.0

Hope this helps.

Note: looking at the spec I think you need to do this:

web_API_gain_value = 0.5

gain_in_db = ratio_to_db(web_API_gain_value)

sound_after_gain = sound.apply_gain(gain_in_db)
Jiaaro
  • 74,485
  • 42
  • 169
  • 190
  • Thank you, that did the trick! I was able to replicate the behavior of the website in question exactly by using the note you provided -- I was using the ratio_to_db function entirely wrongly, since I had no clue at all what it actually did. As I said I don't have much background in audio processing etc. but your explanation cleared some things up for me. I am still a bit confused about the 16 bit comment. What do you mean by "is about 90dB" -- that it can only store that amplitude, i.e. the maximum loudness? And if I had actually 16 bit files, wouldn't reducing it by 120db be "too much" then? – cryzed Feb 08 '15 at 01:36