1

I have the following audio file

When i plot it using the following code, i get this:

audio_data, sr = librosa.load('test.wav')
plt.plot(audio_data)
plt.show()

Wave plot of audio

I am trying to get the number of segments in this audio. For this example, there are three dictinct segments.

Here is what I have tried so far:

I set a minimum of 0.

And then I have 2 pointers, i and j

I iterate through the data (with i) and if i see a difference of greater than 0.05, i set a variable called in_segment to true, and I start to iterate from that point onwards with j. If i see a difference less than 0.025, i stop, increment my segment count by 1, and then restart the process from that point

It didnt work, so I decided to get rid of the negative values in the array, but still not getting 3 as the output. I get 1620

Here is the code:

audio_data, sr = librosa.load(audio_data)
segments = 0
in_segment = False
end_segment = False

audio_data = audio_data[audio_data > 0]#np.abs(audio_data)
n = len(audio_data)
minimum = 0 #np.min(audio_data)

for i in range(n): 
    diff = audio_data[i] - minimum
    if (diff > 0.05):
        in_segment = True
        for j in range(i, n):
            diff = audio_data[j] - minimum
            if (diff < 0.05 and in_segment):
                end_segment = True
                in_segment = False
                break

        if (end_segment):
            segments += 1
            end_segment = False
            i = j 

print(segments)    

I expect to get 3 as the answer, but I am not sure how to fix this code. I suspect there are sudden spikes, which is whats causing the error. I also tried the abs values of the array, but did not work. Any one know how to fix this or a library that can count the rises in the data?

Sharhad
  • 11
  • 1
  • can you use librosa onset detection? https://librosa.org/doc/latest/onset.html – jaket Dec 29 '22 at 04:13
  • It sort of worked, but i still have to first do `audio_data = audio_data[audio_data > 0.03]` and then `segments = len(librosa.onset.onset_detect(y=audio_data, sr=sr, units='time'))`. Is there a better way to pick the number 0.03? – Sharhad Dec 29 '22 at 06:18
  • Since you are trying to do this programmatic with a script, I assume that you have more than a single such audio clip to detect segments in? How many such clips, and what are their commonalities and differences? How much / what kind of information is acceptable to input manually for each clip? – Jon Nordby Dec 31 '22 at 20:12

0 Answers0