I have the following audio file
When i plot it using the following code, i get this:
audio_data, sr = librosa.load('test.wav')
plt.plot(audio_data)
plt.show()
I am trying to get the number of segments in this audio. For this example, there are three dictinct segments.
Here is what I have tried so far:
I set a minimum of 0.
And then I have 2 pointers, i and j
I iterate through the data (with i) and if i see a difference of greater than 0.05, i set a variable called in_segment
to true, and I start to iterate from that point onwards with j. If i see a difference less than 0.025, i stop, increment my segment count by 1, and then restart the process from that point
It didnt work, so I decided to get rid of the negative values in the array, but still not getting 3 as the output. I get 1620
Here is the code:
audio_data, sr = librosa.load(audio_data)
segments = 0
in_segment = False
end_segment = False
audio_data = audio_data[audio_data > 0]#np.abs(audio_data)
n = len(audio_data)
minimum = 0 #np.min(audio_data)
for i in range(n):
diff = audio_data[i] - minimum
if (diff > 0.05):
in_segment = True
for j in range(i, n):
diff = audio_data[j] - minimum
if (diff < 0.05 and in_segment):
end_segment = True
in_segment = False
break
if (end_segment):
segments += 1
end_segment = False
i = j
print(segments)
I expect to get 3 as the answer, but I am not sure how to fix this code. I suspect there are sudden spikes, which is whats causing the error. I also tried the abs values of the array, but did not work. Any one know how to fix this or a library that can count the rises in the data?