How to segment series of audio recordings according to onsets/offsets specified in dataframe (pydub/python)

Question

I am trying to extract audio segments from a series of audio recordings. Segment onsets and offsets to be extracted from each recording are specified in a dataframe with three columns containing a) the name of the sound recording, b) the onset of the segment, and c) the offset of the segment (see below)

segm_info_dic = {'Sentence': ['x', 'y', 'z'], 'Onset': [100, 200, 300], 'Offset': [200, 300, 400]}
segm_info_df = pd.DataFrame(data=segm_info_dic)

I then tried to loop over the audio recordings and the rows of the dataframe, so that each audio recording is segmented at the right point and then saved as a new recording.

for index, row in segm_info_df.iterrows():
    for sound_file in sound_list:
        sound_path = os.path.join(sound_folder, sound_file)
        sound = AudioSegment.from_wav(sound_path)
        w1 = sound[row['Onset']:row['Offset']]
        new_sound.export(os.path.join(new_folder, w1))

However, my loop does not work since only the last audio recording in the list is segmented at the right point. I have just started using Python so I am really not sure how I should set up the loop correctly. Thank you in advance!

score 0 · Answer 1 · answered Oct 29 '20 at 09:06

0

Shouldn't you do that instead ?

for sound_file in sound_list:
    for index, row in segm_info_df.iterrows():
        sound_path = os.path.join(sound_folder, sound_file)
        sound = AudioSegment.from_wav(sound_path)
        w1 = sound[row['Onset']:row['Offset']]
        new_sound.export(os.path.join(new_folder, w1))

answered Oct 29 '20 at 09:06

dspr

2,383
2
15
19

I tried that but unfortunately id does not work either! – HHH Oct 29 '20 at 09:20
I can not understand exactly what you are doing from your example, but `sound_file` seems to have no impact on the iteration through `index` and `row` and that appears a little bit strange to me. Are you sure you don't have to update your data frame for each new file ? – dspr Oct 29 '20 at 10:04
As far as I can see from the code and the output recordings, the first loop correctly iterates through the audio recordings. The problem is the second loop (over the rows in the dataframe). Indeed, I think that, for each audio file, the second loop creates new segments starting from the info in the first row until the last row. Of course, these files overwrite each other, therefore, only the segment created using the info from the last row of the dataframe is saved. How do I loop over the dataframe so that the audio recordings are segmented according to the info in the corresponding row? – HHH Oct 29 '20 at 10:26

How to segment series of audio recordings according to onsets/offsets specified in dataframe (pydub/python)

1 Answers1