I am trying to make a program, that tells me when a note has been pressed.
I have the following notes exported as a .wav
file (The C Major Scale 4 times with different rhythms, dynamics and in different octaves):
I can get the volumes of my sound file using the following code:
from scipy.io import wavfile
def get_volume(file):
sr, data = wavfile.read(file)
if data.ndim > 1:
data = data[:, 0]
return data
volumes = get_volume("FILE")
Here are some information about the output:
Max: 27851
Min: -25664
Mean: -0.7569383391943734
A Sample from the array: [ -7987 -8615 -8983 -9107 -9019 -8750 -8324 -7752 -7033 -6156
-5115 -3920 -2610 -1245 106 1377 2520 3515 4364 5077
5659 6113 6441 6639 6708 6662 6518 6288 5962 5525
4963 4265 3420 2418 1264 -27 -1429 -2901 -4388 -5814
-7101 -8186 -9028 -9614 -9955 -10077 -10012 -9785 -9401 -8846]
And here is what I get when I plot the volumes array (x is the index, y is the volume):
I want to get the indices of the start and end of the notes like the ones in the image (Did it by hand not accurate):
When I looked at the data I realized, that it is a 1d array and I also noticed, that when a note gets louder or quiter it is not smooth. It is like a ZigZag, but there is still a trend. So basically I can't just get the gradients (slope) of each point. So I though about grouping notes into batches and getting the average gradient there and thus doing the calculations with it, like so:
def get_average_gradient(arr):
# Calculates average gradient
return sum([i - (sum(arr) / len(arr)) for i in arr]) / len(arr)
def get_note_start_end(arr_size, batch_size, arr):
# Finds start and end indices
ranges = []
curr_range = [0]
prev_slope = curr_slope = "NO SLOPE"
has_ended = False
for i, j in enumerate(arr):
if j > 0:
curr_slope = "INCREASING"
elif j < 0:
curr_slope = "DECREASING"
else:
curr_slope = "NO SLOPE"
if prev_slope == "DECREASING" and not has_ended:
if i == len(arr) - 1 or arr[i + 1] < 0:
if curr_slope != "DECREASING":
curr_range.append((i + 1) * batch_size + batch_size)
ranges.append(curr_range)
curr_range = [(i + 1) * batch_size + batch_size + 1]
has_ended = True
if has_ended and curr_slope == "INCREASING":
has_ended = False
prev_slope = curr_slope
ranges[-1][-1] = arr_size - 1
return ranges
def get_notes(batch_size, arr):
# Gets the gradients of the batches
out = []
for i in range(0, len(arr), batch_size):
if i + batch_size > len(arr):
gradient = get_average_gradient(arr[i:])
else:
gradient = get_average_gradient(arr[i: i+batch_size])
# print(gradient, i)
out.append(gradient)
return get_note_start_end(len(arr), batch_size, out)
notes = get_notes(128, volumes)
The problem with this is, that if the batch size is too small, then it returns the indices of small peaks, which aren't a note on their own. If the batch size is too big then the program misses the start and end indices.
I also tried to get the notes, by using the silence. Here is the code I used:
from pydub import AudioSegment, silence
audio = intro = AudioSegment.from_wav("C - Major - Test.wav")
dBFS = audio.dBFS
notes = silence.detect_nonsilent(audio, min_silence_len=50, silence_thresh=dBFS-10)
This worked the best, but it still wasn't good enough. Here is what I got:
It some notes pretty well, but it wasn't able to identify notes accurately if the notes themselves didn't become very quite before a different one was played (Like in the second scale and in the fourth scale).
I have been thinking about this problem for days and I have basically tried most if not all of the good(?) ideas I had. I am new to analysing audio files. Maybe I am using the wrong data to do what I want to do. Maybe I need to use the frequency data (I tried getting it, but couldn't make sense of it) Frequency code:
from scipy.fft import *
from scipy.io import wavfile
import matplotlib.pyplot as plt
def get_freq(file, start_time, end_time):
sr, data = wavfile.read(file)
if data.ndim > 1:
data = data[:, 0]
else:
pass
# Fourier Transform
N = len(data)
yf = rfft(data)
xf = rfftfreq(N, 1 / sr)
return xf, yf
FILE = "C - Major - Test.wav"
plt.plot(*get_freq(FILE, 0, 10))
plt.show()
And here is the .wav file: https://drive.google.com/file/d/1CERH-eovu20uhGoV1_O3B2Ph-4-uXpiP/view?usp=sharing
Any help is appreciated :)