1

Hello is there a way to make audio files split stochastically. So far i have managed to split the audio files into 10 second snippets i would appreciate any help?

from pydub import AudioSegment
from pydub.utils import make_chunks

from pydub import AudioSegment 
from pydub.utils import make_chunks 

myaudio = AudioSegment.from_file('C:/Users/XY/Desktop/input/HouseSample.wav') 
chunk_length_ms = 10000 # pydub calculates in millisec 
chunks = make_chunks(myaudio,chunk_length_ms) #Make chunks of one sec 
for i, chunk in enumerate(chunks): 
    chunk_name = '{0}.wav'.format(i) 
    print ('exporting', chunk_name) 
    chunk.export(chunk_name, format='wav') 
Andreas Rossberg
  • 34,518
  • 3
  • 61
  • 72
ys034
  • 11
  • 1
  • Please provide more information about your specs for those splits. – joanis Apr 05 '22 at 12:47
  • What kind of randomness do you want ? Same chunk length per file ? Same chunk length per batch ? Do you want to split each file with several chunk lengths ? It's quite hard to give you an answer without more information there. – NiziL Apr 05 '22 at 12:47
  • i would like to split each file in several chunk lengths thanks for your help in advance i am still very new to the programming environment – ys034 Apr 05 '22 at 12:53

1 Answers1

0

So, let's say you want several chunk size per file.
In the simplest form, you'll need two things:

  • a new for loop
  • an array with all the chunk size
from pydub import AudioSegment
from pydub.utils import make_chunks
from pydub import AudioSegment 
from pydub.utils import make_chunks 

myaudio = AudioSegment.from_file('C:/Users/XY/Desktop/input/HouseSample.wav') 
chunk_sizes = [10000] # pydub calculates in millisec 
for chunk_length_ms in chunk_sizes:
    chunks = make_chunks(myaudio,chunk_length_ms) #Make chunks of one sec 
    for i, chunk in enumerate(chunks): 
        chunk_name = '{0}.wav'.format(i) 
        print ('exporting', chunk_name) 
        chunk.export(chunk_name, format='wav') 

For now, this code will actually produce the same split as you already have.
To add multiple split, you can simply add more values to the chunk_sizes array, e.g. chunk_sizes = [10000, 5000] for 10 and 5 seconds splits.

If you want to add some randomness, you could rely on any pseudo-random generator like random or numpy.random.

A small example, with 5 different split between 10s and 5s:

import random

N_SPLIT = 5
chunk_sizes = []
for _ in range(N_SPLIT):
    chunk_sizes.append(random.randint(5000, 10000))

Beware, if you need this split to be consistent across your dataset, you'll need to use the same randomized chunk_sizes array for each file, so it might be useful to use a seed here (e.g. random.seed(42)).

NiziL
  • 5,068
  • 23
  • 33