I have a few questions, which are all very related. The main problem here is to convert the amplitude of an audio file to dB scale and I am doing it as below which I am not sure is correct:
y, sr = librosa.load('audio.wav')
S = np.abs(librosa.stft(y))
db_max = librosa.amplitude_to_db(S, ref=np.max)
db_median = librosa.amplitude_to_db(S, ref=np.median)
db_min = librosa.amplitude_to_db(S, ref=np.min)
db_max_AVG = np.mean(db_max, axis=0)
db_median_AVG = np.mean(db_median, axis=0)
db_min_AVG = np.mean(db_min, axis=0)
My question is how can I convert 'y' to dB scale. Is not 'y' the amplitude? Also, the shape of 'y' and 'db_max_AVG' is not the same. The size of 'db_max_AVG' is 9137 while the size of 'y' is 4678128. Another question is that my audio file is 3 minutes and 32 seconds and the shape of y is:
print(y.shape)
(4678128,)
I do not know what this number represents because it obviously does not represent milliseconds or microseconds. Below you can see two plots of 'y' using different methods:
plt.plot(y)
plt.show()
librosa.display.waveplot(y, sr=22050, x_axis='time')